Name: jaygala24/Qwen2.5-3B-RLOO-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

Overview

This model, jaygala24/Qwen2.5-3B-RLOO-math-reasoning, is a specialized 3.1 billion parameter language model derived from Qwen2.5-3B. Its primary distinction lies in its fine-tuning process, which employs the RLOO (REINFORCE Leave-One-Out) algorithm without a KL penalty, specifically targeting enhanced mathematical reasoning capabilities.

Key Capabilities & Training

Mathematical Reasoning: Optimized for solving complex math problems, as evidenced by its training on gsm8k_train and math_train datasets.
RLOO Algorithm: Utilizes a unique reinforcement learning approach where the advantage baseline is the leave-one-out mean reward, trained with a REINFORCE-style policy loss.
Performance: Achieves notable results on math reasoning benchmarks:
- GSM8K (test): 86.47% pass@1, 97.12% pass@32
- MATH-500: 69.59% pass@1, 90.80% pass@32
- Overall: 81.83% pass@1, 95.38% pass@32 across 1819 problems.
Context Length: Supports a sequence length of 8192 tokens during training.

Why this model is different

Unlike general-purpose LLMs, this model's specific RLOO fine-tuning makes it particularly adept at step-by-step mathematical problem-solving. Its training methodology and evaluation metrics highlight a focused effort on improving accuracy in arithmetic and algebraic reasoning, making it a strong candidate for applications requiring reliable mathematical output.

Overview

Overview

Key Capabilities & Training

Why this model is different

Full Model Card (README)