infly/inf-o1-pi0

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Jan 3, 2025Architecture:Transformer0.0K Cold

infly/inf-o1-pi0 is a 32.8 billion parameter reasoning foundation large language model developed by INFLY TECH (Shanghai) Co., Ltd., based on the Qwen2.5-32B-Instruct architecture. It is specifically designed to enhance reasoning capabilities across various industrial domains, excelling in mathematics, programming, logic, and safety tasks. The model leverages a meticulously designed data production pipeline with self-verification and backtracking mechanisms to induce robust long-context reasoning.

Loading preview...

INF-o1-pi0: Enhanced Reasoning Foundation Model

INF-o1-pi0 is a 32.8 billion parameter large language model developed by INFLY TECH (Shanghai) Co., Ltd., serving as an initial checkpoint for their reasoning foundation project. Built upon the Qwen2.5-32B-Instruct base, this model focuses on advancing reasoning capabilities across diverse industrial domains, including mathematics, programming, logic, and safety.

Key Capabilities & Differentiators

  • Advanced Reasoning: Emphasizes robust long-context reasoning, achieved through a unique data production pipeline incorporating self-verification and backtracking mechanisms.
  • Domain Specialization: Designed to address real-world industrial scenarios with increased precision and reliability, particularly in areas like math, logic, and SQL.
  • Performance Benchmarks: Demonstrates superior performance compared to its base model (Qwen2.5-32B-Instruct) and other similar models in various benchmarks:
    • Math: Achieves 88.60 on MATH and 40.00 on AIME24, outperforming Qwen2.5-32B-Instruct.
    • Logic: Scores 71.8 on the LSAT benchmark, significantly higher than its base.
    • Safety: Attains 77.25 on AIR-BENCH 2024, indicating strong safety alignment.
    • SQL: Shows improved performance with 55.3 on BIRD and 79.7 on SPIDER.
  • Reinforcement Learning Foundation: Serves as a crucial initial policy checkpoint for future reinforcement learning training, aiming to generalize reasoning capabilities further, especially in financial and medical domains.

When to Use This Model

INF-o1-pi0 is ideal for applications requiring strong, verifiable reasoning across complex tasks. Its strengths in mathematics, logical deduction, and structured query generation make it suitable for industrial applications where accuracy and trustworthiness of reasoning are paramount. Developers looking for a model with a robust foundation for further fine-tuning in specialized reasoning tasks, particularly in finance and medicine, will find this model highly beneficial.