Model Overview
Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT is a supervised fine-tuned (SFT) variant of the Qwen3-4B-Instruct model, developed by Harsha901. This 4 billion parameter model is specifically optimized for mathematical reasoning and step-by-step problem-solving, building upon the Qwen3 architecture. It was fine-tuned using Unsloth and Hugging Face's TRL library, resulting in approximately 2x faster training.
Key Capabilities
- Multi-step mathematical reasoning: Handles complex math problems requiring several logical steps.
- Algebra, arithmetic, and word problems: Proficient in various mathematical domains.
- Chain-of-thought style explanations: Generates clear, logically structured reasoning chains.
- Improved instruction adherence: Follows prompts precisely for consistent outputs.
- More stable reasoning: Offers enhanced reliability compared to its base model.
Training and Evaluation
The model was trained on a curated dataset of instruction-style math prompts and step-by-step solutions, emphasizing logical consistency and clear intermediate steps. While formal benchmark results are planned, qualitative evaluations show improved structured reasoning and more consistent intermediate steps compared to the base model.
Good For
- Math problem solving: Ideal for generating solutions with detailed explanations.
- Educational assistants: Can serve as a tool for teaching and learning mathematics.
- Reasoning benchmarks: Suitable for tasks requiring logical deduction and problem-solving.
- Downstream alignment: A strong foundation for further preference tuning (DPO / RLHF).
Limitations
It's important to note that the model's outputs are not guaranteed to be mathematically correct in all cases and should be verified for critical applications. It can also be verbose due to its reasoning-style outputs and is not optimized for creative or non-technical writing.