jaygala24/Qwen2.5-1.5B-ReMax-math-reasoning

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 13, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The jaygala24/Qwen2.5-1.5B-ReMax-math-reasoning model is a 1.5 billion parameter Qwen2.5-based causal language model fine-tuned for mathematical reasoning. Developed by jaygala24, it leverages the ReMax reinforcement learning algorithm without a KL penalty, trained on GSM8K and MATH datasets. This model is optimized to excel in complex arithmetic and algebraic problem-solving, demonstrating strong pass@k scores on mathematical benchmarks with a 32768 token context length.

Loading preview...

jaygala24/Qwen2.5-1.5B-ReMax-math-reasoning Overview

This model is a specialized 1.5 billion parameter variant of the Qwen2.5-1.5B architecture, fine-tuned by jaygala24 specifically for mathematical reasoning tasks. It utilizes the ReMax reinforcement learning algorithm without a KL penalty, a method designed to enhance performance in specific domains by optimizing directly for reward signals.

Key Capabilities & Training

  • Mathematical Reasoning Focus: The model was trained on a combination of gsm8k_train and math_train datasets, making it highly proficient in solving arithmetic and algebraic problems.
  • Reinforcement Learning: Employs the ReMax algorithm with a greedy-decoded response reward as the advantage baseline, and a policy loss based on PPO, with a KL coefficient of 0.0.
  • Performance: Achieves notable pass@k scores on mathematical benchmarks, including 76.71% pass@1 on GSM8K (test) and 57.79% pass@1 on MATH-500, with overall pass@32 reaching 94.34% across 1819 problems.
  • Context Length: Supports a substantial context length of 8192 tokens during training, allowing for processing longer problem descriptions and reasoning steps.

Use Cases

  • Automated Math Problem Solving: Ideal for applications requiring accurate step-by-step mathematical reasoning and final answer derivation.
  • Educational Tools: Can be integrated into platforms for generating solutions or explanations for math problems.
  • Research in RL for Reasoning: Serves as a strong baseline or component for further research into reinforcement learning applications for complex reasoning tasks.