jaygala24/Qwen3-4B-DAPO-math-reasoning

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 29, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The jaygala24/Qwen3-4B-DAPO-math-reasoning model is a 4 billion parameter language model, fine-tuned from Qwen3-4B, specifically optimized for mathematical reasoning tasks. It utilizes the DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) reinforcement learning algorithm without a KL penalty. This model demonstrates strong performance on mathematical benchmarks like GSM8K and MATH-500, achieving high pass@k scores. It is designed for applications requiring accurate step-by-step mathematical problem-solving.

Loading preview...

Model Overview

This model, jaygala24/Qwen3-4B-DAPO-math-reasoning, is a 4 billion parameter language model derived from the Qwen3-4B base model. It has been specifically fine-tuned using the DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) reinforcement learning algorithm to excel in mathematical reasoning tasks. The training process leveraged the PipelineRL framework and focused on datasets like gsm8k_train and math_train.

Key Capabilities & Features

  • Specialized Mathematical Reasoning: Optimized for solving complex math problems step-by-step.
  • DAPO Algorithm: Utilizes an advanced RL algorithm that extends GRPO with features like asymmetric PPO clipping, dynamic sampling, token-level loss aggregation, and overlong reward shaping.
  • High Performance on Math Benchmarks: Achieves impressive pass@k scores on standard mathematical datasets:
    • GSM8K (test): 90.40 pass@1, 98.18 pass@32
    • MATH-500: 77.98 pass@1, 95.40 pass@32
    • Overall: 86.98 pass@1, 97.42 pass@32
  • Context Length: Supports a sequence length of 8192 tokens during training, indicating a robust context understanding.

When to Use This Model

This model is ideal for use cases requiring reliable and accurate mathematical problem-solving. Developers should consider this model for:

  • Automated Math Solvers: Applications that need to generate step-by-step solutions to arithmetic and algebraic problems.
  • Educational Tools: Integrating into platforms that assist students with math homework or provide detailed explanations.
  • Quantitative Analysis: Tasks involving numerical reasoning and data interpretation where precise calculations are critical.

It is particularly well-suited for scenarios where strong performance on benchmarks like GSM8K and MATH-500 is a priority.