XBai o4: Advanced Reasoning with Reflective Generative Form

XBai o4 is a 32 billion parameter large language model from MetaStone-AI, distinguished by its innovative training methodology. It employs a reflective generative form that unifies "Long-CoT Reinforcement Learning" and "Process Reward Learning." This unique approach allows the model to achieve both deep reasoning capabilities and efficient selection of high-quality reasoning trajectories within a single architecture.

Key Capabilities & Innovations

Enhanced Reasoning: Excels in complex reasoning tasks, demonstrating performance that surpasses OpenAI-o3-mini in Medium mode.
Efficient Inference: By sharing the backbone network between Policy Models (PRMs) and policy models, XBai o4 reduces PRM inference costs by 99%, leading to faster and higher-quality responses.
Novel Training Paradigm: Leverages a combined training form for simultaneous deep reasoning and optimal trajectory selection.

Performance Highlights

XBai o4 demonstrates strong performance across various benchmarks, particularly in reasoning and coding tasks. The 'high' variant achieves 86.5 on AIME24, 77.9 on AIME25, 67.2 on LiveCodeBench v5, and 89.7 on C-EVAL, often outperforming comparable models like Qwen3-32B and OpenAI-o3-mini-medium.

Ideal Use Cases

Applications requiring complex problem-solving and logical deduction.
Scenarios where efficient and high-quality reasoning is critical.
Tasks benefiting from reduced inference costs for advanced reasoning processes.