apple/SimpleSD-4B-thinking
apple/SimpleSD-4B-thinking is a 4 billion parameter language model initialized with Qwen, fine-tuned using the Simple Self-Distillation (SimpleSD) method. This model is specifically designed to improve code generation by leveraging its own sampled outputs for supervised learning. It achieves notable gains on competitive programming benchmarks, particularly on harder problems, by resolving a precision–exploration conflict in token distributions.
Loading preview...
Overview
apple/SimpleSD-4B-thinking is a 4 billion parameter model developed by Apple, showcasing the Simple Self-Distillation (SimpleSD) method for enhanced code generation. This technique involves fine-tuning a base language model on its own sampled outputs, generated with non-unit temperature and top-k/top-p truncation, without relying on rewards, verifiers, or reinforcement learning. The model is initialized using Qwen and serves as a research checkpoint for reproducibility.
Key Capabilities
- Improved Code Generation: Demonstrates significant gains on competitive programming benchmarks, particularly for more challenging problems.
- Self-Distillation Method: Utilizes a novel self-distillation approach that reshapes token distributions to make a single global decoding configuration more effective at evaluation time.
- Reproducible Research: Provided as a research checkpoint to facilitate reproducibility of the SimpleSD method.
Performance Highlights
Compared to its base Qwen3-4B-Thinking-2507 model, SimpleSD-4B-thinking shows improved performance on LiveCodeBench (LCB):
- LCBv6 pass@1: +3.3 points (from 54.5% to 57.8%)
- LCBv6 pass@5: +3.9 points (from 67.5% to 71.4%)
- LCBv5 pass@1: +3.5 points (from 59.6% to 63.1%)
- LCBv5 pass@5: +4.4 points (from 70.3% to 74.7%)
Good for
- Researchers interested in self-distillation techniques for code generation.
- Developers looking for models optimized for competitive programming tasks.
- Exploring methods to improve code generation without complex reinforcement learning setups.