jaygala24/Qwen3-1.7B-DAPO-math-reasoning

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 23, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The jaygala24/Qwen3-1.7B-DAPO-math-reasoning model is a 1.7 billion parameter Qwen3-based language model, fine-tuned by jaygala24 using DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) without KL penalty. It is specifically optimized for mathematical reasoning tasks, demonstrating strong performance on benchmarks like GSM8K and MATH-500. This model leverages a 32K token context length and is designed for applications requiring robust mathematical problem-solving capabilities.

Loading preview...

Model Overview

The jaygala24/Qwen3-1.7B-DAPO-math-reasoning model is a specialized 1.7 billion parameter language model built upon the Qwen3 architecture. It has been fine-tuned using a novel DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) algorithm without KL penalty to enhance its mathematical reasoning abilities. The training utilized the PipelineRL framework and focused on datasets like gsm8k and math_train.

Key Capabilities & Performance

This model excels in mathematical problem-solving, as evidenced by its evaluation results on standard benchmarks:

  • GSM8K (test): Achieves a pass@32 score of 97.57% and pass@1 of 80.97%.
  • MATH-500: Reaches a pass@32 score of 91.60% and pass@1 of 65.77%.
  • Overall: Demonstrates a combined pass@32 of 95.93% across 1819 problems.

The DAPO algorithm, which extends GRPO with features like asymmetric PPO clipping and dynamic sampling, was applied with a sequence length of 8192 tokens during training.

Good For

  • Mathematical Reasoning: Ideal for tasks requiring step-by-step mathematical problem-solving.
  • Academic Research: Useful for researchers exploring advanced reinforcement learning techniques like DAPO for fine-tuning LLMs.
  • Educational Tools: Can be integrated into applications designed to assist with or evaluate mathematical understanding.