AmirMohseni/qwen-2.5-math-1.5b-dsr-sub-v2
AmirMohseni/qwen-2.5-math-1.5b-dsr-sub-v2 is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-Math-1.5B. This model specializes in mathematical reasoning tasks, leveraging the GRPO training method for enhanced performance. It is designed for applications requiring robust mathematical problem-solving capabilities, building upon the Qwen2.5 architecture with a 32768 token context length.
Loading preview...
Model Overview
This model, AmirMohseni/qwen-2.5-math-1.5b-dsr-sub-v2, is a specialized 1.5 billion parameter language model fine-tuned from the base Qwen/Qwen2.5-Math-1.5B model. It has been trained using the TRL framework.
Key Capabilities
- Enhanced Mathematical Reasoning: The model's primary focus is on mathematical tasks, benefiting from a fine-tuning process that incorporates the GRPO (Gradient-based Reward Policy Optimization) method.
- GRPO Training: Training utilized the GRPO method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), to improve its mathematical problem-solving abilities.
- Qwen2.5 Architecture: Built upon the Qwen2.5 family, it inherits a robust base for language understanding and generation.
When to Use This Model
This model is particularly well-suited for use cases that require:
- Mathematical Problem Solving: Ideal for applications demanding accurate and robust mathematical reasoning.
- Research in Mathematical LLMs: Useful for researchers exploring advanced training techniques like GRPO for specialized domains.
Technical Details
The model was trained with specific framework versions including TRL 0.22.0.dev0, Transformers 4.55.4, Pytorch 2.7.1, Datasets 4.0.0, and Tokenizers 0.21.4.