graf/Qwen3-4B-SFT-science-1e-5

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:May 9, 2026License:otherArchitecture:Transformer Cold

The graf/Qwen3-4B-SFT-science-1e-5 model is a 4 billion parameter language model, fine-tuned from the Qwen3-4B architecture by graf. It has a context length of 32768 tokens and is specifically optimized for science-related tasks, having been fine-tuned on the dolci_science_train dataset. This model is designed for applications requiring specialized knowledge and understanding within scientific domains.

Loading preview...

Overview

This model, graf/Qwen3-4B-SFT-science-1e-5, is a specialized version of the 4 billion parameter Qwen3-4B base model. It has been fine-tuned by graf using the dolci_science_train dataset, indicating a focus on scientific domain understanding and generation. The training process involved a learning rate of 1e-05 over 3 epochs, utilizing a total batch size of 128 across 4 GPUs.

Key Characteristics

  • Base Model: Qwen3-4B architecture.
  • Parameter Count: 4 billion parameters.
  • Context Length: Supports a context window of 32768 tokens.
  • Fine-tuning Focus: Specialized for science-related tasks through training on the dolci_science_train dataset.
  • Training Performance: Achieved a final validation loss of 0.6816 during fine-tuning.

Intended Use Cases

This model is best suited for applications that require processing or generating content within scientific fields. Its fine-tuning on a science-specific dataset suggests improved performance for tasks such as:

  • Answering scientific questions.
  • Summarizing scientific texts.
  • Generating scientific explanations or reports.
  • Assisting with scientific research-related queries.