linius/Qwen3-8B-SPoT
Qwen3-8B-SPoT is an 8 billion parameter large language model developed by linius, post-trained from the Qwen/Qwen3-8B base model. Utilizing the Surgical Post-Training (SPoT) paradigm, it significantly enhances reasoning capabilities, particularly for complex math and reasoning tasks. The model achieves an average accuracy improvement of 6.2% on these tasks while effectively mitigating catastrophic forgetting of general knowledge. It is designed for applications requiring robust reasoning performance with strong knowledge retention.
Loading preview...
Qwen3-8B-SPoT: Reasoning-Enhanced LLM
Qwen3-8B-SPoT is an 8 billion parameter large language model, post-trained from the Qwen/Qwen3-8B base model using the novel Surgical Post-Training (SPoT) paradigm. This approach, detailed in the paper "Surgical Post-Training: Cutting Errors, Keeping Knowledge" (Lin & Han, 2026), focuses on significantly improving reasoning capabilities without the typical drawbacks of catastrophic forgetting.
Key Capabilities & Performance
- Enhanced Reasoning: Achieves an average accuracy improvement of 6.2% on complex in-domain and Out-of-Domain (OOD) math and reasoning tasks compared to its base model.
- Efficient Training: Trained with only 4k rectified math data pairs, avoiding complex multi-phase fine-tuning pipelines.
- Knowledge Retention: Demonstrates robust mitigation of catastrophic forgetting, maintaining stability on general capability benchmarks like IFEval.
- Context Length: Supports a context length of 32768 tokens.
Good For
- Applications requiring strong mathematical and complex reasoning abilities.
- Use cases where knowledge retention alongside specialized skill enhancement is crucial.
- Developers looking for an efficient, reasoning-focused model without extensive fine-tuning requirements.