QED-Nano SFT: Mathematical Reasoning Model
QED-Nano SFT is a 4 billion parameter model developed by lm-provers, specifically designed for mathematical reasoning and theorem proving. It is a supervised fine-tuning (SFT) checkpoint, serving as the foundational stage for the more advanced QED-Nano model which incorporates reinforcement learning and reasoning cache.
Key Capabilities & Features
- Mathematical Reasoning: Optimized for olympiad-level proof problems, generating proofs with explicit chain-of-thought reasoning within
<think> tags. - Knowledge Distillation: Achieves strong performance at 4B parameters, distilled from the much larger 685B parameter DeepSeek-Math-V2.
- Open Weights & Training Details: Fully open-source with public data mixture and training configurations.
- Performance: Outperforms Qwen3-4B-Thinking-2507 on IMO-ProofBench (39.5 vs 20.4) and ProofBench (33.3 vs 19.5).
Limitations & When to Use
This SFT-only model has critical limitations including length explosion (generating responses over 100k tokens) and potential correctness issues as it imitates reasoning style but doesn't optimize for correctness. It also lacks self-correction capabilities present in the full QED-Nano model. It is domain-specific to theorem proving and English-only, without visual capabilities.
Should you use this for your use case?
- Yes, if: You are researching or developing mathematical reasoning systems and need a strong SFT baseline for theorem proving, or if you plan to further fine-tune with RL. It's a good starting point for understanding distilled mathematical reasoning.
- No, if: You require a production-ready model for mathematical proofs where correctness and controlled output length are paramount. For such cases, the full lm-provers/QED-Nano model, which includes RL training and reasoning cache, is recommended. It is also not suitable for general-purpose assistant tasks or non-English applications.