graf/Qwen3-4B-SFT-science-1e-5
The graf/Qwen3-4B-SFT-science-1e-5 model is a 4 billion parameter language model, fine-tuned from the Qwen3-4B architecture by graf. It has a context length of 32768 tokens and is specifically optimized for science-related tasks, having been fine-tuned on the dolci_science_train dataset. This model is designed for applications requiring specialized knowledge and understanding within scientific domains.
Loading preview...
Overview
This model, graf/Qwen3-4B-SFT-science-1e-5, is a specialized version of the 4 billion parameter Qwen3-4B base model. It has been fine-tuned by graf using the dolci_science_train dataset, indicating a focus on scientific domain understanding and generation. The training process involved a learning rate of 1e-05 over 3 epochs, utilizing a total batch size of 128 across 4 GPUs.
Key Characteristics
- Base Model: Qwen3-4B architecture.
- Parameter Count: 4 billion parameters.
- Context Length: Supports a context window of 32768 tokens.
- Fine-tuning Focus: Specialized for science-related tasks through training on the
dolci_science_traindataset. - Training Performance: Achieved a final validation loss of 0.6816 during fine-tuning.
Intended Use Cases
This model is best suited for applications that require processing or generating content within scientific fields. Its fine-tuning on a science-specific dataset suggests improved performance for tasks such as:
- Answering scientific questions.
- Summarizing scientific texts.
- Generating scientific explanations or reports.
- Assisting with scientific research-related queries.