guangyangnlp/Qwen3-1.7B-SFT-science-2e-5
The guangyangnlp/Qwen3-1.7B-SFT-science-2e-5 model is a 1.7 billion parameter Qwen3-based language model, fine-tuned specifically on the dolci_science_train dataset. This specialization optimizes its performance for scientific text generation and understanding, making it particularly suitable for tasks within scientific domains. It leverages a 32K context window, enhancing its ability to process longer scientific documents and complex queries.
Loading preview...
Model Overview
The guangyangnlp/Qwen3-1.7B-SFT-science-2e-5 is a specialized language model derived from the Qwen3-1.7B architecture. It has been fine-tuned with a learning rate of 2e-05 over 3 epochs on the dolci_science_train dataset, aiming to enhance its capabilities in scientific contexts. The model maintains a 1.7 billion parameter count and supports a 32,768 token context length.
Key Characteristics
- Base Model: Qwen/Qwen3-1.7B, a robust foundation for language understanding.
- Fine-tuning Focus: Specifically trained on the
dolci_science_traindataset, indicating an optimization for scientific text processing. - Training Parameters: Utilized AdamW_Torch_Fused optimizer, cosine learning rate scheduler, and a total batch size of 128.
- Performance: Achieved a final validation loss of 0.7490 during training, suggesting improved performance on its target domain.
Potential Use Cases
This model is particularly suited for applications requiring a strong understanding and generation of scientific content. While specific intended uses and limitations are not detailed in the original README, its fine-tuning on a science-specific dataset suggests utility in areas such as:
- Scientific Text Analysis: Summarizing research papers, extracting key information from scientific articles.
- Question Answering: Answering queries related to scientific topics.
- Content Generation: Drafting scientific explanations, reports, or educational materials.
Further evaluation would be needed to determine its precise strengths and limitations within various scientific sub-domains.