graf/Qwen3-1.7B-SFT-medical-2e-5
The graf/Qwen3-1.7B-SFT-medical-2e-5 model is a 1.7 billion parameter Qwen3-based language model fine-tuned by graf. It is specifically optimized for medical applications, having been trained on the medical_o1_train dataset. This model demonstrates a validation loss of 1.4089, indicating its specialized performance in medical contexts. Its primary use case is in tasks requiring medical domain knowledge.
Loading preview...
Model Overview
The graf/Qwen3-1.7B-SFT-medical-2e-5 is a specialized language model built upon the Qwen3-1.7B architecture. This model has undergone supervised fine-tuning (SFT) specifically for medical applications, utilizing the medical_o1_train dataset.
Key Characteristics
- Base Model: Qwen/Qwen3-1.7B, a 1.7 billion parameter model.
- Domain Specialization: Fine-tuned on a medical dataset (
medical_o1_train) to enhance performance in healthcare-related tasks. - Performance: Achieved a validation loss of 1.4089 during training, indicating its focused optimization for the medical domain.
Training Details
The model was trained with a learning rate of 2e-05, a batch size of 16, and a gradient accumulation of 8, resulting in an effective total batch size of 128. It utilized the ADAMW_TORCH_FUSED optimizer and a cosine learning rate scheduler over 3 epochs.
Intended Use Cases
This model is designed for applications requiring a deep understanding and generation of medical-related text. It is particularly suitable for tasks such as:
- Medical text analysis
- Information extraction from clinical notes
- Supporting medical question-answering systems
Limitations
As with any specialized model, its performance outside the medical domain may be limited. Further information regarding specific intended uses and limitations is needed for a comprehensive understanding.