Nanbeige4.1-3B: A Small General Model for Reasoning and Agentic Tasks
Nanbeige4.1-3B is an enhanced 3 billion parameter model, developed by Chen Yang et al., that significantly improves upon its predecessor, Nanbeige4-3B-Thinking-2511, through supervised fine-tuning (SFT) and reinforcement learning (RL). This model demonstrates that compact architectures can achieve high performance across multiple critical areas.
Key Capabilities and Differentiators
- Strong Reasoning: Nanbeige4.1-3B is adept at solving complex, multi-step problems, consistently producing correct answers on challenging benchmarks like LiveCodeBench-Pro, IMO-Answer-Bench, and AIME 2026 I. It significantly outperforms same-scale models and even larger models like Qwen3-30B-A3B and Qwen3-32B in various code, math, and science reasoning tasks.
- Robust Preference Alignment: The model exhibits solid alignment performance, surpassing not only same-scale competitors (e.g., Qwen3-4B-2507) but also substantially larger models (e.g., Qwen3-30B-A3B) on benchmarks such as Arena-Hard-v2 and Multi-Challenge.
- Advanced Agentic Capability: Uniquely for a small general model, Nanbeige4.1-3B natively supports deep-search tasks and can reliably sustain complex problem-solving involving over 500 rounds of tool invocations. This capability fills a notable gap, as most small models are typically optimized for either general reasoning or agentic scenarios, but rarely both.
Performance Highlights
Nanbeige4.1-3B shows superior performance across general reasoning and deep-search benchmarks. For instance, it achieves 76.9 on Live-Code-Bench-V6 (compared to 66.0 for Qwen3-30B-A3B-2507) and 87.40 on AIME 2026 I (compared to 87.30 for Qwen3-30B-A3B). In deep-search tasks, it scores 75 on xBench-DeepSearch-2505, significantly outperforming other small foundation models like Qwen3-4B-2507 (34) and even competing with specialized small agents.
Limitations
While safety is emphasized during training, the probabilistic nature and size of the model mean it may occasionally generate unexpected or harmful content. Users are advised to exercise caution and not propagate such outputs.