od2961/Qwen2.5-1.5B-Open-R1-GRPO-math-v1
od2961/Qwen2.5-1.5B-Open-R1-GRPO-math-v1 is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was specifically trained using the GRPO method on the OpenR1-Math-220k dataset, optimizing its capabilities for mathematical reasoning tasks. This model is designed to excel in solving complex mathematical problems, leveraging its specialized training for enhanced accuracy and performance in this domain.
Loading preview...
Overview
od2961/Qwen2.5-1.5B-Open-R1-GRPO-math-v1 is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. Its primary distinction lies in its specialized training using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper. This model was specifically fine-tuned on the OpenR1-Math-220k dataset, making it highly optimized for mathematical reasoning tasks.
Key Capabilities
- Enhanced Mathematical Reasoning: Specialized training on a dedicated math dataset significantly improves its ability to understand and solve mathematical problems.
- GRPO Training: Utilizes the GRPO method, a technique designed to push the limits of mathematical reasoning in open language models, as detailed in the DeepSeekMath paper.
- Qwen2.5 Architecture: Benefits from the robust architecture of the Qwen2.5 series, providing a strong foundation for its specialized capabilities.
Good For
- Applications requiring strong mathematical problem-solving abilities.
- Research and development in improving LLM performance on quantitative tasks.
- Scenarios where a smaller, specialized model for math is preferred over larger, general-purpose models.