Kwai-Klear/Klear-Reasoner-8B-SFT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Aug 13, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Kwai-Klear/Klear-Reasoner-8B-SFT is an 8-billion-parameter reasoning model developed by Kwai-Klear, featuring a 32768-token context length. It is specifically optimized for complex mathematical and coding tasks, achieving state-of-the-art performance on benchmarks like AIME and LiveCodeBench. The model incorporates a novel Gradient-Preserving Clipping Policy Optimization (GPPO) method to enhance exploration and convergence during training, making it particularly effective for problem-solving requiring careful deliberation.

Loading preview...

Klear-Reasoner-8B-SFT: Advanced Reasoning for Math and Code

Klear-Reasoner-8B-SFT is an 8-billion-parameter model from Kwai-Klear, designed for long reasoning capabilities, particularly in mathematics and coding. It demonstrates outstanding performance on challenging benchmarks such as AIME 2024/2025 and LiveCodeBench V5/V6, achieving scores up to 90.5% and 66.0% respectively with a 64K inference budget.

Key Innovations:

  • Quality-centric long CoT SFT: Leverages supervised fine-tuning distilled from DeepSeek-R1-0528.
  • Gradient-Preserving Clipping Policy Optimization (GPPO): A novel Reinforcement Learning (RL) method that preserves gradients from clipped tokens, significantly boosting exploration and convergence during training.

Performance Highlights:

  • Achieves competitive results against other 7B-8B models on AIME and LiveCodeBench, with notable improvements when using a 64K inference budget.
  • The model's training environment utilizes a sandbox for code evaluation (Firejail) and a math verification system (math_verify).

Use Cases:

  • Complex Mathematical Problem Solving: Excels in advanced math competitions and tasks requiring multi-step reasoning.
  • Code Generation and Debugging: Strong performance on live coding benchmarks suggests utility in programming assistance and automated code solutions.