graf/Qwen3-1.7B-SFT-medical-2e-5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 17, 2026License:otherArchitecture:Transformer0.0K Warm

The graf/Qwen3-1.7B-SFT-medical-2e-5 model is a 1.7 billion parameter Qwen3-based language model fine-tuned by graf. It is specifically optimized for medical applications, having been trained on the medical_o1_train dataset. This model demonstrates a validation loss of 1.4089, indicating its specialized performance in medical contexts. Its primary use case is in tasks requiring medical domain knowledge.

Loading preview...

Model Overview

The graf/Qwen3-1.7B-SFT-medical-2e-5 is a specialized language model built upon the Qwen3-1.7B architecture. This model has undergone supervised fine-tuning (SFT) specifically for medical applications, utilizing the medical_o1_train dataset.

Key Characteristics

  • Base Model: Qwen/Qwen3-1.7B, a 1.7 billion parameter model.
  • Domain Specialization: Fine-tuned on a medical dataset (medical_o1_train) to enhance performance in healthcare-related tasks.
  • Performance: Achieved a validation loss of 1.4089 during training, indicating its focused optimization for the medical domain.

Training Details

The model was trained with a learning rate of 2e-05, a batch size of 16, and a gradient accumulation of 8, resulting in an effective total batch size of 128. It utilized the ADAMW_TORCH_FUSED optimizer and a cosine learning rate scheduler over 3 epochs.

Intended Use Cases

This model is designed for applications requiring a deep understanding and generation of medical-related text. It is particularly suitable for tasks such as:

  • Medical text analysis
  • Information extraction from clinical notes
  • Supporting medical question-answering systems

Limitations

As with any specialized model, its performance outside the medical domain may be limited. Further information regarding specific intended uses and limitations is needed for a comprehensive understanding.