Name: jaygala24/Qwen3-4B-GRPO-KL-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

jaygala24/Qwen3-4B-GRPO-KL-math-reasoning: Enhanced Mathematical Reasoning

This model is a specialized fine-tune of the Qwen3-4B base model, developed by jaygala24, focusing on advanced mathematical reasoning capabilities. It utilizes Group Relative Policy Optimization (GRPO) with a KL penalty, a reinforcement learning technique, to significantly improve its performance on complex math problems.

Key Capabilities & Training

Mathematical Reasoning: Specifically trained and optimized for solving mathematical problems, including arithmetic and word problems.
GRPO Fine-tuning: Employs GRPO with a KL coefficient of 0.001 and a policy loss of ppo for robust policy optimization.
Comprehensive Training Data: Fine-tuned on a combination of gsm8k_train and math_train datasets, ensuring exposure to a wide range of mathematical challenges.
High Sequence Length: Trained with a sequence length of 8192, allowing for processing longer problem descriptions and reasoning steps.

Performance Highlights

Evaluated on standard mathematical benchmarks, the model demonstrates strong results:

GSM8K (test): Achieves a pass@1 of 89.47% and pass@32 of 96.13%.
MATH-500: Achieves a pass@1 of 81.04% and pass@32 of 96.00%.
Overall: Boasts an impressive overall pass@1 of 87.15% and pass@32 of 96.10% across 1819 problems.

Ideal Use Cases

This model is particularly well-suited for applications requiring accurate and detailed step-by-step mathematical problem-solving, such as:

Educational tools for math assistance.
Automated problem solvers for quantitative tasks.
Research in improving LLM mathematical reasoning.

Overview

jaygala24/Qwen3-4B-GRPO-KL-math-reasoning: Enhanced Mathematical Reasoning

Key Capabilities & Training

Performance Highlights

Ideal Use Cases

Full Model Card (README)