Model Overview
The jhn9803/Qwen2.5-MATH-1.5B-Instruct-DAPO-G8 is a 1.5 billion parameter instruction-tuned model built upon the Qwen2.5-Math-1.5B-Instruct base. It has been specialized for mathematical reasoning through fine-tuning on the jhn9803/hendrycks-math-with-answers dataset.
Key Capabilities
- Mathematical Reasoning: Optimized for solving mathematical problems, leveraging a dataset specifically curated for this purpose.
- GRPO Training: Incorporates the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its mathematical problem-solving abilities.
- Instruction Following: Designed to follow instructions effectively, making it suitable for interactive mathematical tasks.
Training Details
The model was trained using the TRL (Transformer Reinforcement Learning) framework. The GRPO method, which is central to its mathematical performance, is detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).
Good For
- Mathematical Problem Solving: Ideal for applications requiring the model to understand and solve various mathematical challenges.
- Research in Mathematical LLMs: Provides a base for further experimentation and development in mathematical reasoning with language models.