jaygala24/Qwen2.5-3B-DAPO-math-reasoning
jaygala24/Qwen2.5-3B-DAPO-math-reasoning is a 3.1 billion parameter Qwen2.5-based causal language model fine-tuned by jaygala24. It is specifically optimized for mathematical reasoning tasks using DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) without KL penalty. The model demonstrates strong performance on benchmarks like GSM8K and MATH-500, achieving an overall pass@1 of 82.16% and pass@32 of 95.99%. This model is ideal for applications requiring accurate step-by-step mathematical problem-solving.
Loading preview...
Model Overview
This model, jaygala24/Qwen2.5-3B-DAPO-math-reasoning, is a 3.1 billion parameter Qwen2.5 variant specifically fine-tuned for mathematical reasoning. It leverages DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) without KL penalty to enhance its problem-solving capabilities, trained using the PipelineRL framework.
Key Capabilities
- Advanced Mathematical Reasoning: Optimized for complex arithmetic and algebraic problems, as evidenced by its performance on specialized datasets.
- DAPO Fine-tuning: Utilizes an advanced reinforcement learning algorithm that extends GRPO with features like clip-higher (asymmetric PPO clipping) and dynamic sampling for improved policy optimization.
- Strong Benchmark Performance: Achieves notable results on mathematical benchmarks:
- GSM8K (test): 86.52% pass@1, 97.50% pass@32
- MATH-500: 70.66% pass@1, 92.00% pass@32
- Overall: 82.16% pass@1, 95.99% pass@32
- Efficient Training: Trained with a sequence length of 8192 and an effective batch size of 256, utilizing DeepSpeed ZeRO Stage 3 for efficiency.
Good For
- Applications requiring high accuracy in mathematical problem-solving.
- Educational tools or systems that need to generate step-by-step mathematical reasoning.
- Research into reinforcement learning for language models, particularly DAPO and its effectiveness in specialized domains.