graf/Qwen3-1.7B-SFT-science-2e-5

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 17, 2026License:otherArchitecture:Transformer Cold

The graf/Qwen3-1.7B-SFT-science-2e-5 model is a fine-tuned version of the Qwen3-1.7B architecture, developed by Qwen, featuring approximately 1.7 billion parameters and a 32K context length. This model has been specifically fine-tuned on the dolci_science_train dataset, indicating an optimization for scientific text understanding and generation. Its primary use case is likely in scientific domains, leveraging its specialized training for improved performance in related tasks.

Loading preview...

Model Overview

This model, graf/Qwen3-1.7B-SFT-science-2e-5, is a specialized fine-tuned variant of the Qwen3-1.7B base model, developed by Qwen. It features approximately 1.7 billion parameters and supports a 32,768 token context length, making it suitable for processing substantial amounts of text.

Key Specialization

The model has undergone supervised fine-tuning (SFT) using the dolci_science_train dataset. This targeted training suggests an enhanced capability for tasks within scientific domains. The fine-tuning process aimed to adapt the general-purpose Qwen3-1.7B model to better understand and generate content relevant to scientific inquiry.

Training Details

Training was conducted with a learning rate of 2e-05, a batch size of 2 (accumulated to 128), and ran for 3.0 epochs. The training procedure utilized the AdamW optimizer with a cosine learning rate scheduler. Evaluation metrics show a final validation loss of 0.7464, indicating successful adaptation to the scientific dataset.

Potential Use Cases

Given its fine-tuning on a scientific dataset, this model is likely well-suited for applications requiring:

  • Scientific text analysis: Understanding and summarizing research papers, articles, or technical documents.
  • Information extraction: Identifying key concepts, entities, or relationships within scientific literature.
  • Scientific content generation: Assisting in drafting scientific explanations, hypotheses, or reports.

Limitations

As indicated in the original model card, further information regarding intended uses, limitations, and comprehensive training/evaluation data is needed for a complete understanding of its scope and performance boundaries.