jaygala24/Qwen3-1.7B-GRPO-KL-math-reasoning

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 6, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

jaygala24/Qwen3-1.7B-GRPO-KL-math-reasoning is a 1.7 billion parameter Qwen3 model fine-tuned by jaygala24 for enhanced mathematical reasoning. It utilizes Group Relative Policy Optimization (GRPO) with a KL penalty, trained on GSM8K and MATH datasets. This model is specifically optimized to achieve high pass@k scores on complex math reasoning benchmarks, making it suitable for applications requiring robust arithmetic and logical problem-solving capabilities.

Loading preview...

Model Overview

This model, jaygala24/Qwen3-1.7B-GRPO-KL-math-reasoning, is a specialized fine-tuned version of the Qwen3-1.7B base model. Its primary distinction lies in its training methodology, employing Group Relative Policy Optimization (GRPO) with a KL penalty via the PipelineRL framework, specifically targeting mathematical reasoning tasks.

Key Capabilities & Training

  • Mathematical Reasoning: Optimized for solving complex math problems, as evidenced by its strong performance on benchmarks.
  • Reinforcement Learning Fine-tuning: Leverages GRPO with a KL coefficient of 0.001 and a PPO policy loss, enhancing its ability to generate correct step-by-step reasoning.
  • Dataset Focus: Trained on gsm8k_train and math_train datasets, with evaluation on gsm8k_test and math_500.
  • Performance: Achieves notable pass@k scores, including an 80.07% pass@1 on GSM8K (test) and 69.64% pass@1 on MATH-500, with overall pass@32 reaching 95.16% across both datasets.
  • Technical Stack: Built using PipelineRL, Transformers, and DeepSpeed (ZeRO Stage 3) for efficient training.

Use Cases

This model is particularly well-suited for applications requiring accurate and detailed mathematical problem-solving. Developers should consider this model for:

  • Automated Math Tutors: Generating step-by-step solutions for arithmetic and algebraic problems.
  • Quantitative Analysis: Assisting in tasks that demand precise numerical reasoning.
  • Educational Tools: Providing explanations and answers to mathematical queries.