jaygala24/Qwen2.5-3B-ReMax-math-reasoning
jaygala24/Qwen2.5-3B-ReMax-math-reasoning is a 3.1 billion parameter language model fine-tuned from Qwen2.5-3B. It is specifically optimized for mathematical reasoning tasks using the ReMax reinforcement learning algorithm without a KL penalty. This model excels at solving complex math problems, as demonstrated by its high pass@k scores on GSM8K and MATH-500 benchmarks, making it suitable for applications requiring robust mathematical problem-solving capabilities.
Loading preview...
Overview
The jaygala24/Qwen2.5-3B-ReMax-math-reasoning model is a specialized 3.1 billion parameter language model built upon the Qwen2.5-3B architecture. Its primary distinction lies in its fine-tuning process, which leverages the ReMax reinforcement learning algorithm without a KL penalty using the PipelineRL framework. This targeted training aims to significantly enhance its performance in mathematical reasoning.
Key Capabilities & Training
- Mathematical Reasoning Focus: The model was specifically trained on mathematical datasets, including
gsm8k_trainandmath_train, to develop strong problem-solving skills. - ReMax Algorithm: Utilizes the ReMax algorithm with a greedy-decoded response's reward as the baseline for advantages, a key aspect of its reinforcement learning approach.
- Performance Benchmarks: Achieves notable pass@k scores on standard mathematical reasoning benchmarks:
- GSM8K (test): 85.99% pass@1, 97.50% pass@32
- MATH-500: 67.36% pass@1, 91.20% pass@32
- Overall: 80.87% pass@1, 95.77% pass@32 (weighted by problem count).
- Training Details: Trained with a sequence length of 8192, a learning rate of
1e-06, and utilizing DeepSpeed ZeRO Stage 3 for efficiency.
When to Use This Model
This model is particularly well-suited for applications requiring accurate and robust mathematical problem-solving. Developers should consider jaygala24/Qwen2.5-3B-ReMax-math-reasoning for tasks such as:
- Automated math problem solvers.
- Educational tools that require step-by-step mathematical reasoning.
- Any application where precise numerical and logical deduction is critical.