Frugal-Math-4B is a 4 billion parameter, reasoning-optimized variant of Qwen3-4B-Thinking-2507, developed by MBZUAI-Paris. Trained with Reinforcement Learning with Verifiable Rewards (RLVR) and a 40960 token context length, it specializes in generating concise and verifiable mathematical solutions. This model achieves significant reductions in reasoning length while maintaining or improving accuracy on complex math benchmarks, making it ideal for efficient mathematical reasoning tasks.
Frugal-Math-4B: Efficient Mathematical Reasoning
Frugal-Math-4B, developed by MBZUAI-Paris, is a 4 billion parameter model based on Qwen3-4B-Thinking-2507, specifically optimized for mathematical reasoning. It leverages Reinforcement Learning with Verifiable Rewards (RLVR) and a novel training approach that uses "easy samples as length regularizers" to achieve emergent brevity.
Key Capabilities & Differentiators
- Concise Reasoning: The model learns to generate significantly shorter, verifiable mathematical solutions without explicit length penalties, reducing average output length by 50-60% compared to its base model.
- High Accuracy: Despite its brevity, Frugal-Math-4B-Stage 2 outperforms all 4B-class baselines in both accuracy and efficiency, achieving an average Efficiency-Adjusted Accuracy (EAA) of 52.86% across diverse math benchmarks.
- Efficiency-Adjusted Accuracy (EAA): Introduces a new metric that jointly evaluates accuracy and brevity, penalizing unnecessarily long reasoning chains.
- Robust Training: Trained using Group Relative Policy Optimization (GRPO) on a curated mix of math datasets, including a filtered subset of DeepMath-103k, across two stages focusing on brevity and progressive learning.
Ideal Use Cases
- Verifiable Mathematical Reasoning: Excels at competition-style math problems requiring precise and verifiable solutions.
- Efficiency-Accuracy Trade-off Studies: Useful for research and applications focused on optimizing the balance between solution accuracy and computational efficiency in RLHF/RLVR contexts.
While highly effective for math, its generalization to other domains is an area of ongoing research.