jaygala24/Qwen2.5-3B-ReMax-math-reasoning

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 13, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

jaygala24/Qwen2.5-3B-ReMax-math-reasoning is a 3.1 billion parameter language model fine-tuned from Qwen2.5-3B. It is specifically optimized for mathematical reasoning tasks using the ReMax reinforcement learning algorithm without a KL penalty. This model excels at solving complex math problems, as demonstrated by its high pass@k scores on GSM8K and MATH-500 benchmarks, making it suitable for applications requiring robust mathematical problem-solving capabilities.

Loading preview...

Overview

The jaygala24/Qwen2.5-3B-ReMax-math-reasoning model is a specialized 3.1 billion parameter language model built upon the Qwen2.5-3B architecture. Its primary distinction lies in its fine-tuning process, which leverages the ReMax reinforcement learning algorithm without a KL penalty using the PipelineRL framework. This targeted training aims to significantly enhance its performance in mathematical reasoning.

Key Capabilities & Training

  • Mathematical Reasoning Focus: The model was specifically trained on mathematical datasets, including gsm8k_train and math_train, to develop strong problem-solving skills.
  • ReMax Algorithm: Utilizes the ReMax algorithm with a greedy-decoded response's reward as the baseline for advantages, a key aspect of its reinforcement learning approach.
  • Performance Benchmarks: Achieves notable pass@k scores on standard mathematical reasoning benchmarks:
    • GSM8K (test): 85.99% pass@1, 97.50% pass@32
    • MATH-500: 67.36% pass@1, 91.20% pass@32
    • Overall: 80.87% pass@1, 95.77% pass@32 (weighted by problem count).
  • Training Details: Trained with a sequence length of 8192, a learning rate of 1e-06, and utilizing DeepSpeed ZeRO Stage 3 for efficiency.

When to Use This Model

This model is particularly well-suited for applications requiring accurate and robust mathematical problem-solving. Developers should consider jaygala24/Qwen2.5-3B-ReMax-math-reasoning for tasks such as:

  • Automated math problem solvers.
  • Educational tools that require step-by-step mathematical reasoning.
  • Any application where precise numerical and logical deduction is critical.