Kwaipilot/SRPO-Qwen-32B
TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Apr 21, 2025License:mitArchitecture:Transformer0.0K Open Weights Cold

The Kwaipilot/SRPO-Qwen-32B model is a 32.8 billion parameter language model developed by Kwaipilot, based on the Qwen2.5 architecture. It utilizes a novel two-staged history-resampling policy optimization (SRPO) framework to achieve superior cross-domain reasoning performance in both mathematical and coding tasks. This model demonstrates enhanced capabilities in complex problem-solving, including self-reflection and code-based verification, outperforming DeepSeek-R1-Zero-32B with significantly fewer training steps.

Loading preview...