Model Overview

This model, sft__ot30k_Qwen3-1.7B-Base-DPO-Tulu3-decontaminated, is a specialized version of the Qwen3-1.7B-Base-DPO-Tulu3-decontaminated architecture, featuring approximately 2 billion parameters and a 32K token context length. It has undergone supervised fine-tuning (SFT) on the open_thoughts3-1.2_m_30000_samples dataset, indicating an optimization for tasks related to the content and style of this specific dataset.

Training Details

The fine-tuning process utilized a learning rate of 4e-05, with a total training batch size of 128 across 32 devices. The optimizer was ADAMW_TORCH_FUSED, and a cosine learning rate scheduler with 0.1 warmup steps was employed over 5 epochs. This configuration suggests a focused effort to adapt the base model's capabilities to the nuances of the target dataset.

Key Characteristics

Base Model: Fine-tuned from ali-elganzory/Qwen3-1.7B-Base-DPO-Tulu3-decontaminated.
Parameter Count: Approximately 2 billion parameters.
Context Length: Supports a context window of 32,768 tokens.
Fine-tuning Data: Specialized on the open_thoughts3-1.2_m_30000_samples dataset.

Potential Use Cases

This model is best suited for applications where its specific fine-tuning on the open_thoughts3-1.2_m_30000_samples dataset provides a distinct advantage. Developers should consider its use for tasks that align with the domain, style, or content distribution of this training data, as its performance will be optimized for such scenarios.

Overview

Model Overview

Training Details

Key Characteristics

Potential Use Cases

Full Model Card (README)