Overview
Frugal-Math-4B: Efficient Mathematical Reasoning
Frugal-Math-4B, developed by MBZUAI-Paris, is a 4 billion parameter model based on Qwen3-4B-Thinking-2507, specifically optimized for mathematical reasoning. It leverages Reinforcement Learning with Verifiable Rewards (RLVR) and a novel training approach that uses "easy samples as length regularizers" to achieve emergent brevity.
Key Capabilities & Differentiators
- Concise Reasoning: The model learns to generate significantly shorter, verifiable mathematical solutions without explicit length penalties, reducing average output length by 50-60% compared to its base model.
- High Accuracy: Despite its brevity, Frugal-Math-4B-Stage 2 outperforms all 4B-class baselines in both accuracy and efficiency, achieving an average Efficiency-Adjusted Accuracy (EAA) of 52.86% across diverse math benchmarks.
- Efficiency-Adjusted Accuracy (EAA): Introduces a new metric that jointly evaluates accuracy and brevity, penalizing unnecessarily long reasoning chains.
- Robust Training: Trained using Group Relative Policy Optimization (GRPO) on a curated mix of math datasets, including a filtered subset of DeepMath-103k, across two stages focusing on brevity and progressive learning.
Ideal Use Cases
- Verifiable Mathematical Reasoning: Excels at competition-style math problems requiring precise and verifiable solutions.
- Efficiency-Accuracy Trade-off Studies: Useful for research and applications focused on optimizing the balance between solution accuracy and computational efficiency in RLHF/RLVR contexts.
While highly effective for math, its generalization to other domains is an area of ongoing research.