Kudod/NuminaMath-Qwen2.5-1.5B-GRPO-test-v1
Kudod/NuminaMath-Qwen2.5-1.5B-GRPO-test-v1 is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. This model utilizes the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its mathematical reasoning capabilities. With a context length of 131072 tokens, it is optimized for tasks requiring advanced mathematical problem-solving and logical deduction.
Loading preview...
Model Overview
Kudod/NuminaMath-Qwen2.5-1.5B-GRPO-test-v1 is a specialized language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. Its primary distinction lies in its training methodology, which incorporates GRPO (Gradient-based Reward Policy Optimization). This technique, detailed in the DeepSeekMath paper, is designed to significantly improve a model's proficiency in mathematical reasoning.
Key Capabilities
- Enhanced Mathematical Reasoning: The model's training with GRPO specifically targets and improves its ability to understand and solve complex mathematical problems.
- Qwen2.5 Architecture: Built upon the Qwen2.5-1.5B-Instruct foundation, it inherits the general language understanding and generation capabilities of the Qwen family.
- Extended Context Length: Features a substantial context window of 131072 tokens, allowing it to process and reason over lengthy problem descriptions or complex mathematical proofs.
Training Details
The model was fine-tuned using the TRL framework (Transformer Reinforcement Learning), leveraging specific versions of libraries including TRL 0.25.1, Transformers 4.57.1, and Pytorch 2.9.1. The GRPO method, central to its mathematical performance, is a key innovation from the DeepSeekMath research.
Ideal Use Cases
This model is particularly well-suited for applications requiring robust mathematical problem-solving, logical deduction, and handling extensive textual context in technical or scientific domains. Its GRPO-enhanced training makes it a strong candidate for tasks where precise mathematical understanding is critical.