asparius/Qwen2.5-1.5B-GRPO-1ep-iter2
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 24, 2025Architecture:Transformer Warm
The asparius/Qwen2.5-1.5B-GRPO-1ep-iter2 is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B. It was trained using the GRPO method on the DigitalLearningGmbH/MATH-lighteval dataset, specializing it for mathematical reasoning tasks. This model is optimized to enhance performance in complex mathematical problem-solving, leveraging a training approach designed for advanced mathematical capabilities.
Loading preview...
Model Overview
This model, asparius/Qwen2.5-1.5B-GRPO-1ep-iter2, is a specialized 1.5 billion parameter language model derived from the Qwen/Qwen2.5-1.5B architecture. Its primary distinction lies in its fine-tuning process, which utilized the DigitalLearningGmbH/MATH-lighteval dataset.
Key Capabilities
- Enhanced Mathematical Reasoning: The model was trained using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper. This approach specifically targets and improves the model's ability to handle complex mathematical problems and logical reasoning.
- Specialized Fine-tuning: By focusing on a dedicated mathematical dataset, this model aims to provide more accurate and reliable outputs for quantitative tasks compared to general-purpose language models of similar size.
Ideal Use Cases
- Mathematical Problem Solving: Excellent for applications requiring the solution of mathematical equations, proofs, or logical deductions.
- Educational Tools: Can be integrated into platforms for teaching or assisting with mathematics.
- Research in Mathematical AI: Useful for researchers exploring advanced mathematical reasoning in language models.