jaygala24/Qwen2.5-1.5B-ReMax-math-reasoning
The jaygala24/Qwen2.5-1.5B-ReMax-math-reasoning model is a 1.5 billion parameter Qwen2.5-based causal language model fine-tuned for mathematical reasoning. Developed by jaygala24, it leverages the ReMax reinforcement learning algorithm without a KL penalty, trained on GSM8K and MATH datasets. This model is optimized to excel in complex arithmetic and algebraic problem-solving, demonstrating strong pass@k scores on mathematical benchmarks with a 32768 token context length.
Loading preview...
jaygala24/Qwen2.5-1.5B-ReMax-math-reasoning Overview
This model is a specialized 1.5 billion parameter variant of the Qwen2.5-1.5B architecture, fine-tuned by jaygala24 specifically for mathematical reasoning tasks. It utilizes the ReMax reinforcement learning algorithm without a KL penalty, a method designed to enhance performance in specific domains by optimizing directly for reward signals.
Key Capabilities & Training
- Mathematical Reasoning Focus: The model was trained on a combination of
gsm8k_trainandmath_traindatasets, making it highly proficient in solving arithmetic and algebraic problems. - Reinforcement Learning: Employs the ReMax algorithm with a greedy-decoded response reward as the advantage baseline, and a policy loss based on PPO, with a KL coefficient of 0.0.
- Performance: Achieves notable pass@k scores on mathematical benchmarks, including 76.71% pass@1 on GSM8K (test) and 57.79% pass@1 on MATH-500, with overall pass@32 reaching 94.34% across 1819 problems.
- Context Length: Supports a substantial context length of 8192 tokens during training, allowing for processing longer problem descriptions and reasoning steps.
Use Cases
- Automated Math Problem Solving: Ideal for applications requiring accurate step-by-step mathematical reasoning and final answer derivation.
- Educational Tools: Can be integrated into platforms for generating solutions or explanations for math problems.
- Research in RL for Reasoning: Serves as a strong baseline or component for further research into reinforcement learning applications for complex reasoning tasks.