Name: jaygala24/Qwen2.5-1.5B-RLOO-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

Model Overview

This model, jaygala24/Qwen2.5-1.5B-RLOO-math-reasoning, is a 1.5 billion parameter language model derived from Qwen2.5-1.5B. Its primary distinction lies in its fine-tuning using the RLOO (REINFORCE Leave-One-Out) algorithm without KL penalty, specifically optimized for mathematical reasoning tasks. The training leveraged the PipelineRL framework.

Key Capabilities & Performance

The model excels in mathematical problem-solving, as evidenced by its evaluation results on standard benchmarks:

GSM8K (test): Achieved 78.44% pass@1 and 96.06% pass@32.
MATH-500: Achieved 60.14% pass@1 and 89.80% pass@32.
Overall: Demonstrated 73.41% pass@1 and 94.34% pass@32 across 1819 problems.

These results are based on generating 32 samples per problem with a temperature of 1.0. The RLOO algorithm uses a leave-one-out mean reward as the baseline for its REINFORCE-style policy loss, contributing to its specialized reasoning abilities.

Good For

Applications requiring robust mathematical reasoning.
Tasks involving step-by-step problem-solving in mathematics.
Developers looking for a compact model (1.5B parameters) with strong math capabilities.

Overview

Model Overview

Key Capabilities & Performance

Good For

Full Model Card (README)