QwenPilot/FIPO_32B
TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Mar 22, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

QwenPilot/FIPO_32B is a 32 billion parameter language model developed by Qwen Pilot, Alibaba Group, based on the Qwen2.5-32B-Base architecture. It utilizes Future-KL Influenced Policy Optimization (FIPO), a value-free reinforcement learning method, to elicit deeper reasoning. This model is specifically designed to extend chain-of-thought reasoning length beyond 10,000 tokens and significantly improve performance on complex reasoning tasks like AIME 2024.

Loading preview...