jaygala24/Qwen2.5-1.5B-RLOO-math-reasoning
jaygala24/Qwen2.5-1.5B-RLOO-math-reasoning is a 1.5 billion parameter causal language model, fine-tuned from Qwen2.5-1.5B. It specializes in mathematical reasoning tasks, utilizing the RLOO (REINFORCE Leave-One-Out) algorithm without KL penalty during training. This model demonstrates strong performance on benchmarks like GSM8K and MATH-500, making it suitable for applications requiring precise mathematical problem-solving. It was trained with a context length of 8192 tokens.
Loading preview...
Model Overview
This model, jaygala24/Qwen2.5-1.5B-RLOO-math-reasoning, is a 1.5 billion parameter language model derived from Qwen2.5-1.5B. Its primary distinction lies in its fine-tuning using the RLOO (REINFORCE Leave-One-Out) algorithm without KL penalty, specifically optimized for mathematical reasoning tasks. The training leveraged the PipelineRL framework.
Key Capabilities & Performance
The model excels in mathematical problem-solving, as evidenced by its evaluation results on standard benchmarks:
- GSM8K (test): Achieved 78.44% pass@1 and 96.06% pass@32.
- MATH-500: Achieved 60.14% pass@1 and 89.80% pass@32.
- Overall: Demonstrated 73.41% pass@1 and 94.34% pass@32 across 1819 problems.
These results are based on generating 32 samples per problem with a temperature of 1.0. The RLOO algorithm uses a leave-one-out mean reward as the baseline for its REINFORCE-style policy loss, contributing to its specialized reasoning abilities.
Good For
- Applications requiring robust mathematical reasoning.
- Tasks involving step-by-step problem-solving in mathematics.
- Developers looking for a compact model (1.5B parameters) with strong math capabilities.