Name: jaygala24/Qwen2.5-0.5B-RLOO-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

Model Overview

This model, jaygala24/Qwen2.5-0.5B-RLOO-math-reasoning, is a specialized fine-tuned version of the Qwen2.5-0.5B base model. Its primary focus is mathematical reasoning, achieved through a unique training approach using RLOO (REINFORCE Leave-One-Out) without KL penalty.

Key Capabilities & Training

Mathematical Reasoning: The model is explicitly trained and optimized for solving mathematical problems, as evidenced by its evaluation on gsm8k and math datasets.
RLOO Algorithm: It leverages the RLOO algorithm, which uses a leave-one-out mean reward as a baseline for policy loss, enhancing its ability to generate accurate mathematical steps.
Performance: Achieves notable pass@k scores on mathematical benchmarks:
- GSM8K (test): 89.69% pass@32
- MATH-500: 75.00% pass@32
- Overall: 85.65% pass@32 across 1819 problems.
Training Framework: Developed using PipelineRL, with a sequence length of 8192 and trained for 1500 steps.

Good For

Applications requiring accurate step-by-step mathematical problem-solving.
Integration into systems where mathematical reasoning is a core component.
Researchers exploring RL-based fine-tuning methods for specialized tasks, particularly RLOO without KL penalty.

Overview

Model Overview

Key Capabilities & Training

Good For

Full Model Card (README)