jaygala24/Qwen2.5-0.5B-ReMax-math-reasoning

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 13, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The jaygala24/Qwen2.5-0.5B-ReMax-math-reasoning model is a 0.5 billion parameter language model, fine-tuned from Qwen2.5-0.5B. It is specifically optimized for mathematical reasoning tasks using the ReMax reinforcement learning algorithm without a KL penalty. This model excels at solving arithmetic and mathematical problems, demonstrating strong performance on benchmarks like GSM8K and MATH-500. Its primary application is in scenarios requiring accurate step-by-step mathematical problem-solving.

Loading preview...

Model Overview

This model, jaygala24/Qwen2.5-0.5B-ReMax-math-reasoning, is a 0.5 billion parameter language model fine-tuned from the Qwen2.5-0.5B base model. Its core differentiator is the application of the ReMax reinforcement learning algorithm (without KL penalty), specifically tailored to enhance mathematical reasoning capabilities. The training utilized the PipelineRL framework.

Key Capabilities

  • Enhanced Mathematical Reasoning: Fine-tuned on gsm8k_train and math_train datasets to improve problem-solving in mathematics.
  • ReMax Algorithm: Leverages ReMax with a greedy-decoded response's reward as the advantage baseline, focusing on direct reward maximization.
  • Performance on Math Benchmarks: Achieves an overall pass@1 score of 47.71% and pass@32 of 86.42% across GSM8K and MATH-500 datasets, indicating strong performance for its size.

Good For

  • Mathematical Problem Solving: Ideal for applications requiring accurate, step-by-step solutions to arithmetic and algebraic problems.
  • Research in RL for Reasoning: Provides a practical example of ReMax application for improving specific reasoning skills in LLMs.
  • Resource-Constrained Environments: As a 0.5B parameter model, it offers a compact solution for mathematical reasoning where larger models might be impractical.