Satori-reasoning/Satori-7B-Round2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Feb 3, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Satori-reasoning/Satori-7B-Round2 is a 7 billion parameter language model developed by Satori-reasoning, built upon Qwen-2.5-Math-7B. This model is uniquely designed for advanced reasoning tasks, particularly excelling in mathematics, by employing a novel Chain-of-Action-Thought (COAT) reasoning framework. It features autoregressive search capabilities, allowing for self-reflection and self-exploration without external guidance, making it highly effective for complex problem-solving.

Loading preview...

Satori-7B-Round2: Advanced Reasoning with Autoregressive Search

Satori-7B-Round2 is a 7 billion parameter language model developed by Satori-reasoning, based on the Qwen-2.5-Math-7B architecture. Its core innovation lies in its ability to perform autoregressive search, enabling self-reflection and self-exploration during reasoning without external guidance. This is achieved through a novel Chain-of-Action-Thought (COAT) reasoning framework and a two-stage post-training paradigm.

Key Capabilities & Innovations

  • Chain-of-Action-Thought (COAT) Reasoning: Utilizes special meta-action tokens (<|continue|>, <|reflect|>, <|explore|>) to guide the model's reasoning process, allowing it to build upon current trajectories, verify steps, or explore alternative solutions.
  • Two-Stage Training: Involves a small-scale format tuning (FT) stage using imitation learning to internalize the COAT format, followed by a large-scale self-improvement stage using Reinforcement Learning (RL) with "Restart and Explore" (RAE) techniques.
  • Iterative Self-improvement: Employs a kickstarting-inspired approach where knowledge from the teacher policy is distilled into the student model after each RL round, leading to continuous enhancement.

Performance Highlights

  • Mathematics Reasoning: Achieves state-of-the-art performance on various math benchmarks (GSM8K, MATH500, OlympiadBench, AMC2023, AIME2024), outperforming its base model, Qwen-2.5-Math-7B-Instruct, and other small-scale models.
  • General Domain Transferability: Despite being trained primarily on math datasets, Satori-7B-Round2 demonstrates strong transferability and competitive performance on diverse out-of-domain reasoning benchmarks, including logical reasoning (FOLIO, BGQA), code reasoning (CRUXEval), commonsense reasoning (StrategyQA), tabular reasoning (TableBench), and STEM subsets of MMLUPro.

Ideal Use Cases

  • Complex Mathematical Problem Solving: Excels in tasks requiring multi-step reasoning and self-correction.
  • Automated Reasoning Agents: Suitable for applications where an LLM needs to autonomously explore and refine solutions.
  • Educational Tools: Can be leveraged for generating detailed, step-by-step explanations for complex problems.

For more technical details, refer to the blog and research paper.