jaygala24/Qwen2.5-0.5B-DAPO-math-reasoning
The jaygala24/Qwen2.5-0.5B-DAPO-math-reasoning model is a 0.5 billion parameter language model based on the Qwen2.5 architecture, fine-tuned specifically for mathematical reasoning tasks. It utilizes DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) without KL penalty, trained on GSM8K and MATH datasets. This model excels at solving arithmetic and mathematical problems, demonstrating strong performance on benchmarks like GSM8K and MATH-500 with a 32768 token context length.
Loading preview...
Model Overview
This model, jaygala24/Qwen2.5-0.5B-DAPO-math-reasoning, is a specialized fine-tuned version of the Qwen2.5-0.5B base model. Its primary focus is on enhancing mathematical reasoning capabilities through advanced reinforcement learning techniques.
Key Capabilities & Training
- Mathematical Reasoning: Specifically optimized for solving mathematical problems, as evidenced by its training on
gsm8k_trainandmath_traindatasets. - DAPO Algorithm: Fine-tuned using DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) without KL penalty, an extension of GRPO that incorporates asymmetric PPO clipping, dynamic sampling, token-level loss aggregation, and overlong reward shaping.
- Performance: Achieves notable pass@32 scores of 91.36% on GSM8K (test) and 75.00% on MATH-500, with an overall pass@32 of 86.86% across 1819 problems.
- Context Length: Supports a sequence length of 8192 tokens during training, indicating its ability to handle complex, multi-step reasoning problems.
Ideal Use Cases
- Mathematical Problem Solving: Excellent for applications requiring accurate step-by-step mathematical reasoning and final answer derivation.
- Educational Tools: Can be integrated into platforms for teaching or assisting with math homework.
- Research in RL for Reasoning: Provides a strong baseline for further research into reinforcement learning applications for complex reasoning tasks.