open-sci/sft__ot30k_Qwen2.5-1.5B-SFT-Tulu3-decontaminated
The open-sci/sft__ot30k_Qwen2.5-1.5B-SFT-Tulu3-decontaminated model is a 1.5 billion parameter language model, fine-tuned from ali-elganzory/Qwen2.5-1.5B-SFT-Tulu3-decontaminated. It was trained on the open_thoughts3-1.2_m_30000_samples dataset, utilizing a 32768 token context length. This model is optimized for tasks related to the specific dataset it was fine-tuned on, making it suitable for applications requiring specialized knowledge from that data.
Loading preview...
Model Overview
This model, open-sci/sft__ot30k_Qwen2.5-1.5B-SFT-Tulu3-decontaminated, is a 1.5 billion parameter language model. It is a fine-tuned variant of the ali-elganzory/Qwen2.5-1.5B-SFT-Tulu3-decontaminated base model, specifically adapted through supervised fine-tuning (SFT).
Training Details
The model was fine-tuned on the /gpfs/scratch/ehpc524/ot/hf_hub/datasets/open_thoughts_open_thoughts3-1.2_m_30000_samples/default/0.0.0/f679a5c592c8dffb dataset. Key training hyperparameters included:
- Learning Rate: 4e-05
- Batch Size: 1 (train), 8 (eval)
- Gradient Accumulation: 4 steps, leading to a total effective batch size of 128
- Optimizer: ADAMW_TORCH_FUSED
- Scheduler: Cosine with 0.1 warmup ratio
- Epochs: 5.0
Intended Use
Given its fine-tuning on a specific dataset, this model is best suited for tasks and applications that align with the content and domain of the open_thoughts3-1.2_m_30000_samples dataset. Developers should consider its 1.5 billion parameter size and 32768 token context length for deployment efficiency and handling longer inputs.