open-sci/sft__ot30k_Qwen3-1.7B-Base-SFT-Tulu3-decontaminated
The open-sci/sft__ot30k_Qwen3-1.7B-Base-SFT-Tulu3-decontaminated model is a 2 billion parameter language model, fine-tuned from ali-elganzory/Qwen3-1.7B-Base-SFT-Tulu3-decontaminated. It was trained on the open_thoughts3-1.2_m_30000_samples dataset, suggesting an optimization for conversational or instruction-following tasks. With a 32K context length, this model is suitable for applications requiring processing of longer text sequences.
Loading preview...
Model Overview
This model, open-sci/sft__ot30k_Qwen3-1.7B-Base-SFT-Tulu3-decontaminated, is a fine-tuned variant of the ali-elganzory/Qwen3-1.7B-Base-SFT-Tulu3-decontaminated base model. It features approximately 2 billion parameters and supports a context length of 32,768 tokens, enabling it to handle extensive textual inputs and outputs.
Training Details
The model was fine-tuned on the /gpfs/scratch/ehpc524/ot/hf_hub/datasets/open_thoughts_open_thoughts3-1.2_m_30000_samples/default/0.0.0/f679a5c592c8dffb dataset. Key training hyperparameters included a learning rate of 4e-05, a total train batch size of 128 (with 32 devices and 4 gradient accumulation steps), and 5 epochs. The optimizer used was ADAMW_TORCH_FUSED with cosine learning rate scheduling and 0.1 warmup steps.
Potential Use Cases
Given its fine-tuning on a dataset likely related to open-ended thoughts or conversational data, this model is potentially well-suited for:
- Instruction following and dialogue generation: Its SFT (Supervised Fine-Tuning) nature suggests improved performance in responding to specific instructions or engaging in conversational exchanges.
- Long-context understanding: The 32K context window makes it capable of processing and generating coherent text over extended passages, useful for summarization or detailed content creation.
- Research and experimentation: As a fine-tuned model, it provides a base for further domain-specific adaptation or exploration of its learned capabilities.