jaygala24/Qwen3-1.7B-RLOO-math-reasoning

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 23, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

jaygala24/Qwen3-1.7B-RLOO-math-reasoning is a 1.7 billion parameter Qwen3-based causal language model fine-tuned by jaygala24. It is specifically optimized for mathematical reasoning tasks, utilizing the RLOO (REINFORCE Leave-One-Out) algorithm without KL penalty. The model demonstrates strong performance on benchmarks like GSM8K and MATH-500, achieving an overall pass@32 of 95.88%. With a 32768 token context length, it is designed for complex problem-solving in mathematics.

Loading preview...

Model Overview

This model, jaygala24/Qwen3-1.7B-RLOO-math-reasoning, is a specialized fine-tuned version of the Qwen3-1.7B base model. It has been specifically optimized for mathematical reasoning tasks using the RLOO (REINFORCE Leave-One-Out) algorithm without KL penalty, a reinforcement learning approach that leverages a leave-one-out mean reward as a baseline for policy loss. The training was conducted using the PipelineRL framework.

Key Capabilities & Performance

  • Mathematical Reasoning: Excels at solving mathematical problems, as evidenced by its strong performance on standard benchmarks.
  • RLOO Optimization: Utilizes a unique reinforcement learning strategy to enhance reasoning capabilities.
  • Benchmark Results: Achieves competitive pass@k scores on challenging datasets:
    • GSM8K (test): 96.66% pass@32
    • MATH-500: 93.80% pass@32
    • Overall: 95.88% pass@32 across 1819 problems.

Training Details

  • Datasets: Trained on gsm8k_train and math_train datasets.
  • Algorithm: RLOO with a REINFORCE-style policy loss, 0.0 KL Coefficient, and 0.02 Epsilon (clip).
  • Hyperparameters: Trained with a learning rate of 1e-06, bf16 precision, and DeepSpeed ZeRO Stage 3 for efficiency.

Ideal Use Cases

  • Automated Math Problem Solving: Generating step-by-step solutions for arithmetic and algebraic problems.
  • Educational Tools: Assisting in the development of AI tutors or problem-solving aids for mathematics.
  • Research in RL for Reasoning: A strong baseline or component for further research into reinforcement learning applications for complex reasoning tasks.