jaygala24/Qwen2.5-3B-GRPO-math-reasoning
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 6, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The jaygala24/Qwen2.5-3B-GRPO-math-reasoning model is a 3.1 billion parameter Qwen2.5-3B variant, fine-tuned using Group Relative Policy Optimization (GRPO) without KL penalty. This model is specifically optimized for mathematical reasoning tasks, leveraging datasets like GSM8K and MATH. It is designed to provide step-by-step problem-solving capabilities for complex arithmetic and mathematical challenges.

Loading preview...

Overview

This model, jaygala24/Qwen2.5-3B-GRPO-math-reasoning, is a specialized fine-tuned version of the 3.1 billion parameter Qwen2.5-3B base model. Its primary focus is on enhancing mathematical reasoning abilities through a unique training methodology.

Key Capabilities

  • Mathematical Reasoning: Specifically trained to excel at solving mathematical problems, including arithmetic and more complex challenges.
  • GRPO Fine-tuning: Utilizes Group Relative Policy Optimization (GRPO) without a KL penalty, a reinforcement learning technique, to refine its problem-solving approach.
  • Step-by-Step Solutions: Designed to generate detailed, step-by-step reasoning processes for mathematical queries, culminating in a final answer.

Training Details

The model was trained using the PipelineRL framework. Training datasets included gsm8k_train and math_train, with evaluation on gsm8k_test and math_500. Key hyperparameters involved a learning rate of 1e-06, 1500 max training steps, and a sequence length of 8192 tokens. The training leveraged bf16 precision and DeepSpeed ZeRO Stage 3 for efficiency.

When to Use This Model

This model is particularly well-suited for applications requiring robust mathematical problem-solving and detailed reasoning. It's an excellent choice for tasks where accurate, explained mathematical solutions are critical, such as educational tools, automated problem solvers, or data analysis support systems.