thangvip/qwen3-1.7b-dspo-sft-base
The thangvip/qwen3-1.7b-dspo-sft-base is a 1.7 billion parameter Qwen3-based language model, fine-tuned from thangvip/qwen3-1.7b-base-sft-math-1500. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring advanced mathematical problem-solving and logical deduction.
Loading preview...
Model Overview
The thangvip/qwen3-1.7b-dspo-sft-base is a 1.7 billion parameter language model built upon the Qwen3 architecture. It is a fine-tuned iteration of the thangvip/qwen3-1.7b-base-sft-math-1500 model, specifically enhanced through a training process utilizing the TRL framework.
Key Capabilities
- Mathematical Reasoning: The model's training incorporates the GRPO method, as introduced in the "DeepSeekMath" paper, indicating a strong focus on improving mathematical problem-solving and reasoning skills.
- Instruction Following: As an instruction-tuned model, it is designed to respond effectively to user prompts and questions, as demonstrated by the quick start example.
Training Details
This model was trained using the TRL library, a framework for Transformer Reinforcement Learning. The application of the GRPO method, detailed in the DeepSeekMath paper, suggests an emphasis on advanced mathematical and logical processing during its fine-tuning phase.
When to Use This Model
This model is particularly suitable for applications requiring robust mathematical reasoning and accurate responses to complex, instruction-based queries. Its specialized training makes it a strong candidate for tasks where precise logical and numerical understanding is critical.