Klear-Reasoner-8B by Kwai-Klear is an 8-billion-parameter reasoning model with a 32768-token context length, optimized for complex mathematical and coding tasks. It integrates quality-centric long CoT SFT from DeepSeek-R1-0528 and introduces Gradient-Preserving Clipping Policy Optimization (GPPO) to enhance exploration and convergence in RL. This model achieves state-of-the-art performance on challenging math and coding benchmarks, making it suitable for applications requiring advanced problem-solving capabilities.
Loading preview...
Klear-Reasoner-8B: Advanced Reasoning for Math and Code
Kwai-Klear's Klear-Reasoner-8B is an 8-billion-parameter model specifically engineered for robust reasoning in mathematics and coding. It leverages a 32768-token context window to handle complex problems requiring extensive deliberation.
Key Capabilities and Innovations
- State-of-the-Art Performance: Achieves leading scores on challenging benchmarks such as AIME 2024/2025 (up to 90.5%) and LiveCodeBench V5/V6 (up to 66.0%).
- Gradient-Preserving Clipping Policy Optimization (GPPO): Introduces a novel RL method that retains gradients from clipped tokens, significantly boosting exploration and convergence during training.
- Quality-centric Long CoT SFT: Incorporates Supervised Fine-Tuning (SFT) distilled from DeepSeek-R1-0528, focusing on high-quality, long Chain-of-Thought reasoning.
- Enhanced Inference Budget: Demonstrates improved performance with an expanded inference budget of 64K tokens, utilizing the YaRN method for scaling.
Ideal Use Cases
- Complex Mathematical Problem Solving: Excels in competitive math benchmarks like AIME and HMMT.
- Advanced Code Generation and Debugging: Strong performance on LiveCodeBench indicates proficiency in coding tasks.
- Research in Reinforcement Learning: The GPPO method offers insights for RL practitioners.
- Applications Requiring Deliberative Reasoning: Suitable for scenarios where careful, step-by-step problem-solving is critical.