Name: jaygala24/Qwen3-1.7B-RLOO-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

Model Overview

This model, jaygala24/Qwen3-1.7B-RLOO-math-reasoning, is a specialized fine-tuned version of the Qwen3-1.7B base model. It has been specifically optimized for mathematical reasoning tasks using the RLOO (REINFORCE Leave-One-Out) algorithm without KL penalty, a reinforcement learning approach that leverages a leave-one-out mean reward as a baseline for policy loss. The training was conducted using the PipelineRL framework.

Key Capabilities & Performance

Mathematical Reasoning: Excels at solving mathematical problems, as evidenced by its strong performance on standard benchmarks.
RLOO Optimization: Utilizes a unique reinforcement learning strategy to enhance reasoning capabilities.
Benchmark Results: Achieves competitive pass@k scores on challenging datasets:
- GSM8K (test): 96.66% pass@32
- MATH-500: 93.80% pass@32
- Overall: 95.88% pass@32 across 1819 problems.

Training Details

Datasets: Trained on gsm8k_train and math_train datasets.
Algorithm: RLOO with a REINFORCE-style policy loss, 0.0 KL Coefficient, and 0.02 Epsilon (clip).
Hyperparameters: Trained with a learning rate of 1e-06, bf16 precision, and DeepSpeed ZeRO Stage 3 for efficiency.

Ideal Use Cases

Automated Math Problem Solving: Generating step-by-step solutions for arithmetic and algebraic problems.
Educational Tools: Assisting in the development of AI tutors or problem-solving aids for mathematics.
Research in RL for Reasoning: A strong baseline or component for further research into reinforcement learning applications for complex reasoning tasks.

Overview

Model Overview

Key Capabilities & Performance

Training Details

Ideal Use Cases

Full Model Card (README)