Kwai-Klear/Klear-Reasoner-8B
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Aug 11, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Klear-Reasoner-8B by Kwai-Klear is an 8-billion-parameter reasoning model with a 32768-token context length, optimized for complex mathematical and coding tasks. It integrates quality-centric long CoT SFT from DeepSeek-R1-0528 and introduces Gradient-Preserving Clipping Policy Optimization (GPPO) to enhance exploration and convergence in RL. This model achieves state-of-the-art performance on challenging math and coding benchmarks, making it suitable for applications requiring advanced problem-solving capabilities.

Loading preview...

Klear-Reasoner-8B: Advanced Reasoning for Math and Code

Kwai-Klear's Klear-Reasoner-8B is an 8-billion-parameter model specifically engineered for robust reasoning in mathematics and coding. It leverages a 32768-token context window to handle complex problems requiring extensive deliberation.

Key Capabilities and Innovations

  • State-of-the-Art Performance: Achieves leading scores on challenging benchmarks such as AIME 2024/2025 (up to 90.5%) and LiveCodeBench V5/V6 (up to 66.0%).
  • Gradient-Preserving Clipping Policy Optimization (GPPO): Introduces a novel RL method that retains gradients from clipped tokens, significantly boosting exploration and convergence during training.
  • Quality-centric Long CoT SFT: Incorporates Supervised Fine-Tuning (SFT) distilled from DeepSeek-R1-0528, focusing on high-quality, long Chain-of-Thought reasoning.
  • Enhanced Inference Budget: Demonstrates improved performance with an expanded inference budget of 64K tokens, utilizing the YaRN method for scaling.

Ideal Use Cases

  • Complex Mathematical Problem Solving: Excels in competitive math benchmarks like AIME and HMMT.
  • Advanced Code Generation and Debugging: Strong performance on LiveCodeBench indicates proficiency in coding tasks.
  • Research in Reinforcement Learning: The GPPO method offers insights for RL practitioners.
  • Applications Requiring Deliberative Reasoning: Suitable for scenarios where careful, step-by-step problem-solving is critical.