XinnanZhang/Qwen3-1.7B-Base-Openthought400K-SFT
XinnanZhang/Qwen3-1.7B-Base-Openthought400K-SFT is a 1.7 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model is specifically trained on the XinnanZhang/openthoughts3-math-50k8 dataset, indicating an optimization for mathematical reasoning tasks. It leverages a 32K context length, making it suitable for processing longer mathematical problems and complex logical sequences.
Loading preview...
Model Overview
This model, XinnanZhang/Qwen3-1.7B-Base-Openthought400K-SFT, is a specialized version of the Qwen3-1.7B-Base architecture. It has been fine-tuned with a focus on enhancing its capabilities in mathematical reasoning.
Key Characteristics
- Base Model: Derived from
Qwen/Qwen3-1.7B-Base. - Parameter Count: 1.7 billion parameters.
- Context Length: Supports a context window of 32,768 tokens.
- Specialized Training: Fine-tuned on the
XinnanZhang/openthoughts3-math-50k8dataset, which suggests an emphasis on mathematical problem-solving and logical thought processes.
Training Details
The model was trained using the following hyperparameters:
- Learning Rate: 8e-05
- Optimizer: ADAMW_TORCH_FUSED
- Epochs: 1.0
- Batch Size: A total training batch size of 512 (distributed across 8 devices).
Potential Use Cases
Given its specialized training, this model is likely well-suited for:
- Mathematical Problem Solving: Assisting with or solving complex math problems.
- Logical Reasoning: Tasks requiring structured thought and step-by-step deduction.
- Educational Tools: Applications in tutoring or generating explanations for mathematical concepts.