jaygala24/Qwen2.5-1.5B-DAPO-math-reasoning
jaygala24/Qwen2.5-1.5B-DAPO-math-reasoning is a 1.5 billion parameter Qwen2.5-based language model fine-tuned using DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) without KL penalty. This model is specifically optimized for mathematical reasoning tasks, demonstrating strong performance on benchmarks like GSM8K and MATH-500. It leverages a 32768-token context length and is designed for applications requiring robust mathematical problem-solving capabilities.
Loading preview...
jaygala24/Qwen2.5-1.5B-DAPO-math-reasoning Overview
This model is a specialized fine-tuned version of the Qwen2.5-1.5B base model, developed by jaygala24. It utilizes DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) without KL penalty as its reinforcement learning algorithm, a method that extends GRPO with asymmetric PPO clipping, dynamic sampling, token-level loss aggregation, and overlong reward shaping. The training focused exclusively on mathematical reasoning datasets, including gsm8k_train and math_train.
Key Capabilities & Performance
- Mathematical Reasoning: Specifically optimized for complex mathematical problem-solving.
- DAPO Fine-tuning: Employs an advanced RL algorithm for enhanced performance in its target domain.
- Strong Benchmark Results: Achieves notable pass@k scores on mathematical benchmarks:
- GSM8K (test): 78.78% pass@1, 95.98% pass@32
- MATH-500: 60.22% pass@1, 88.40% pass@32
- Overall: 73.68% pass@1, 93.90% pass@32
- Context Length: Supports a substantial context length of 32768 tokens.
Training Details
The model was trained for 1500 steps with a learning rate of 1e-06 and an effective batch size of 256, utilizing DeepSpeed ZeRO Stage 3 for efficiency. The RL algorithm parameters include a clip epsilon of 0.2 and a discount factor of 1.0.
Ideal Use Cases
This model is particularly well-suited for applications requiring:
- Automated Mathematical Problem Solving
- Educational Tools that need to generate step-by-step mathematical reasoning.
- Research and Development in AI for mathematics and reasoning tasks.