Kwai-Klear/Klear-Reasoner-8B-SFT
Kwai-Klear/Klear-Reasoner-8B-SFT is an 8-billion-parameter reasoning model developed by Kwai-Klear, featuring a 32768-token context length. It is specifically optimized for complex mathematical and coding tasks, achieving state-of-the-art performance on benchmarks like AIME and LiveCodeBench. The model incorporates a novel Gradient-Preserving Clipping Policy Optimization (GPPO) method to enhance exploration and convergence during training, making it particularly effective for problem-solving requiring careful deliberation.
Loading preview...
Klear-Reasoner-8B-SFT: Advanced Reasoning for Math and Code
Klear-Reasoner-8B-SFT is an 8-billion-parameter model from Kwai-Klear, designed for long reasoning capabilities, particularly in mathematics and coding. It demonstrates outstanding performance on challenging benchmarks such as AIME 2024/2025 and LiveCodeBench V5/V6, achieving scores up to 90.5% and 66.0% respectively with a 64K inference budget.
Key Innovations:
- Quality-centric long CoT SFT: Leverages supervised fine-tuning distilled from DeepSeek-R1-0528.
- Gradient-Preserving Clipping Policy Optimization (GPPO): A novel Reinforcement Learning (RL) method that preserves gradients from clipped tokens, significantly boosting exploration and convergence during training.
Performance Highlights:
- Achieves competitive results against other 7B-8B models on AIME and LiveCodeBench, with notable improvements when using a 64K inference budget.
- The model's training environment utilizes a sandbox for code evaluation (Firejail) and a math verification system (math_verify).
Use Cases:
- Complex Mathematical Problem Solving: Excels in advanced math competitions and tasks requiring multi-step reasoning.
- Code Generation and Debugging: Strong performance on live coding benchmarks suggests utility in programming assistance and automated code solutions.