linius/Qwen3-8B-SPoT

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 4, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Qwen3-8B-SPoT is an 8 billion parameter large language model developed by linius, post-trained from the Qwen/Qwen3-8B base model. Utilizing the Surgical Post-Training (SPoT) paradigm, it significantly enhances reasoning capabilities, particularly for complex math and reasoning tasks. The model achieves an average accuracy improvement of 6.2% on these tasks while effectively mitigating catastrophic forgetting of general knowledge. It is designed for applications requiring robust reasoning performance with strong knowledge retention.

Loading preview...

Qwen3-8B-SPoT: Reasoning-Enhanced LLM

Qwen3-8B-SPoT is an 8 billion parameter large language model, post-trained from the Qwen/Qwen3-8B base model using the novel Surgical Post-Training (SPoT) paradigm. This approach, detailed in the paper "Surgical Post-Training: Cutting Errors, Keeping Knowledge" (Lin & Han, 2026), focuses on significantly improving reasoning capabilities without the typical drawbacks of catastrophic forgetting.

Key Capabilities & Performance

  • Enhanced Reasoning: Achieves an average accuracy improvement of 6.2% on complex in-domain and Out-of-Domain (OOD) math and reasoning tasks compared to its base model.
  • Efficient Training: Trained with only 4k rectified math data pairs, avoiding complex multi-phase fine-tuning pipelines.
  • Knowledge Retention: Demonstrates robust mitigation of catastrophic forgetting, maintaining stability on general capability benchmarks like IFEval.
  • Context Length: Supports a context length of 32768 tokens.

Good For

  • Applications requiring strong mathematical and complex reasoning abilities.
  • Use cases where knowledge retention alongside specialized skill enhancement is crucial.
  • Developers looking for an efficient, reasoning-focused model without extensive fine-tuning requirements.