jaygala24/Qwen3-4B-GRPO-KL-math-reasoning
jaygala24/Qwen3-4B-GRPO-KL-math-reasoning is a fine-tuned version of the Qwen3-4B causal language model, specifically optimized for mathematical reasoning tasks. This model leverages Group Relative Policy Optimization (GRPO) with a KL penalty, trained on datasets like GSM8K and MATH-500. It demonstrates strong performance on mathematical benchmarks, achieving an overall pass@1 of 87.15% and pass@32 of 96.10% across GSM8K and MATH-500 datasets. Its primary strength lies in accurately solving complex math problems through step-by-step reasoning.
Loading preview...
jaygala24/Qwen3-4B-GRPO-KL-math-reasoning: Enhanced Mathematical Reasoning
This model is a specialized fine-tune of the Qwen3-4B base model, developed by jaygala24, focusing on advanced mathematical reasoning capabilities. It utilizes Group Relative Policy Optimization (GRPO) with a KL penalty, a reinforcement learning technique, to significantly improve its performance on complex math problems.
Key Capabilities & Training
- Mathematical Reasoning: Specifically trained and optimized for solving mathematical problems, including arithmetic and word problems.
- GRPO Fine-tuning: Employs GRPO with a KL coefficient of 0.001 and a policy loss of
ppofor robust policy optimization. - Comprehensive Training Data: Fine-tuned on a combination of
gsm8k_trainandmath_traindatasets, ensuring exposure to a wide range of mathematical challenges. - High Sequence Length: Trained with a sequence length of 8192, allowing for processing longer problem descriptions and reasoning steps.
Performance Highlights
Evaluated on standard mathematical benchmarks, the model demonstrates strong results:
- GSM8K (test): Achieves a pass@1 of 89.47% and pass@32 of 96.13%.
- MATH-500: Achieves a pass@1 of 81.04% and pass@32 of 96.00%.
- Overall: Boasts an impressive overall pass@1 of 87.15% and pass@32 of 96.10% across 1819 problems.
Ideal Use Cases
This model is particularly well-suited for applications requiring accurate and detailed step-by-step mathematical problem-solving, such as:
- Educational tools for math assistance.
- Automated problem solvers for quantitative tasks.
- Research in improving LLM mathematical reasoning.