Satori-7B-Round2: Advanced Reasoning with Autoregressive Search
Satori-7B-Round2 is a 7 billion parameter language model developed by Satori-reasoning, based on the Qwen-2.5-Math-7B architecture. Its core innovation lies in its ability to perform autoregressive search, enabling self-reflection and self-exploration during reasoning without external guidance. This is achieved through a novel Chain-of-Action-Thought (COAT) reasoning framework and a two-stage post-training paradigm.
Key Capabilities & Innovations
- Chain-of-Action-Thought (COAT) Reasoning: Utilizes special meta-action tokens (
<|continue|>, <|reflect|>, <|explore|>) to guide the model's reasoning process, allowing it to build upon current trajectories, verify steps, or explore alternative solutions. - Two-Stage Training: Involves a small-scale format tuning (FT) stage using imitation learning to internalize the COAT format, followed by a large-scale self-improvement stage using Reinforcement Learning (RL) with "Restart and Explore" (RAE) techniques.
- Iterative Self-improvement: Employs a kickstarting-inspired approach where knowledge from the teacher policy is distilled into the student model after each RL round, leading to continuous enhancement.
Performance Highlights
- Mathematics Reasoning: Achieves state-of-the-art performance on various math benchmarks (GSM8K, MATH500, OlympiadBench, AMC2023, AIME2024), outperforming its base model, Qwen-2.5-Math-7B-Instruct, and other small-scale models.
- General Domain Transferability: Despite being trained primarily on math datasets, Satori-7B-Round2 demonstrates strong transferability and competitive performance on diverse out-of-domain reasoning benchmarks, including logical reasoning (FOLIO, BGQA), code reasoning (CRUXEval), commonsense reasoning (StrategyQA), tabular reasoning (TableBench), and STEM subsets of MMLUPro.
Ideal Use Cases
- Complex Mathematical Problem Solving: Excels in tasks requiring multi-step reasoning and self-correction.
- Automated Reasoning Agents: Suitable for applications where an LLM needs to autonomously explore and refine solutions.
- Educational Tools: Can be leveraged for generating detailed, step-by-step explanations for complex problems.
For more technical details, refer to the blog and research paper.