Name: jaygala24/Qwen3-4B-RLOO-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

Model Overview

This model, jaygala24/Qwen3-4B-RLOO-math-reasoning, is a 4 billion parameter variant of the Qwen3-4B base model, specifically fine-tuned for enhanced mathematical reasoning capabilities. It leverages a unique Reinforcement Learning approach called RLOO (REINFORCE Leave-One-Out), which uses a leave-one-out mean reward as the advantage baseline and operates without a KL penalty, distinguishing its training methodology from many other RLHF models.

Key Capabilities & Training

Mathematical Reasoning: The model is explicitly trained on gsm8k_train and math_train datasets, focusing on arithmetic and advanced mathematical problems.
RLOO Algorithm: Employs a REINFORCE-style policy loss with a group-structured RLOO algorithm, where each response's advantage is calculated against the mean of other responses in its group.
Performance: Achieves high pass@k scores on mathematical benchmarks:
- GSM8K (test): 90.08% pass@1, 97.73% pass@32
- MATH-500: 79.19% pass@1, 96.00% pass@32
- Overall: 87.09% pass@1, 97.25% pass@32
Context Length: Supports a substantial context window of 32768 tokens, beneficial for complex multi-step problems.

Ideal Use Cases

This model is particularly well-suited for applications requiring:

Accurate mathematical problem-solving.
Step-by-step reasoning in quantitative tasks.
Integration into systems where robust mathematical capabilities are critical.

Overview

Model Overview

Key Capabilities & Training

Ideal Use Cases

Full Model Card (README)