Name: jaygala24/Qwen3-4B-GRPO-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

Model Overview

This model, jaygala24/Qwen3-4B-GRPO-math-reasoning, is a specialized fine-tune of the Qwen3-4B base model. It has been optimized for mathematical reasoning using Group Relative Policy Optimization (GRPO) without a KL penalty, leveraging the PipelineRL framework.

Key Capabilities & Training

Mathematical Reasoning: Specifically trained on gsm8k_train and math_train datasets to enhance its ability to solve mathematical problems.
GRPO Optimization: Utilizes GRPO with a policy loss of ppo and a KL coefficient of 0.0, indicating a focus on direct policy improvement.
Performance: Achieves notable pass@1 scores of 89.11% on GSM8K (test) and 79.90% on MATH-500, with overall pass@32 reaching 95.66% across both datasets.
Training Details: Trained for 1500 steps with a sequence length of 8192 and an effective batch size of 256, using DeepSpeed ZeRO Stage 3 for efficiency.

Good For

Applications requiring accurate step-by-step mathematical problem-solving.
Tasks involving arithmetic, algebra, and other quantitative reasoning.
Developers looking for a Qwen3-4B variant optimized for numerical and logical deduction.

Overview

Model Overview

Key Capabilities & Training

Good For

Full Model Card (README)