MBZUAI-Paris/Frugal-Math-4B

Warm
Public
4B
BF16
40960
License: apache-2.0
Hugging Face
Overview

Frugal-Math-4B: Efficient Mathematical Reasoning

Frugal-Math-4B, developed by MBZUAI-Paris, is a 4 billion parameter model based on Qwen3-4B-Thinking-2507, specifically optimized for mathematical reasoning. It leverages Reinforcement Learning with Verifiable Rewards (RLVR) and a novel training approach that uses "easy samples as length regularizers" to achieve emergent brevity.

Key Capabilities & Differentiators

  • Concise Reasoning: The model learns to generate significantly shorter, verifiable mathematical solutions without explicit length penalties, reducing average output length by 50-60% compared to its base model.
  • High Accuracy: Despite its brevity, Frugal-Math-4B-Stage 2 outperforms all 4B-class baselines in both accuracy and efficiency, achieving an average Efficiency-Adjusted Accuracy (EAA) of 52.86% across diverse math benchmarks.
  • Efficiency-Adjusted Accuracy (EAA): Introduces a new metric that jointly evaluates accuracy and brevity, penalizing unnecessarily long reasoning chains.
  • Robust Training: Trained using Group Relative Policy Optimization (GRPO) on a curated mix of math datasets, including a filtered subset of DeepMath-103k, across two stages focusing on brevity and progressive learning.

Ideal Use Cases

  • Verifiable Mathematical Reasoning: Excels at competition-style math problems requiring precise and verifiable solutions.
  • Efficiency-Accuracy Trade-off Studies: Useful for research and applications focused on optimizing the balance between solution accuracy and computational efficiency in RLHF/RLVR contexts.

While highly effective for math, its generalization to other domains is an area of ongoing research.