Name: Satori-reasoning/Satori-7B-Round2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Satori-reasoning

Satori-7B-Round2: Advanced Reasoning with Autoregressive Search

Satori-7B-Round2 is a 7 billion parameter language model developed by Satori-reasoning, based on the Qwen-2.5-Math-7B architecture. Its core innovation lies in its ability to perform autoregressive search, enabling self-reflection and self-exploration during reasoning without external guidance. This is achieved through a novel Chain-of-Action-Thought (COAT) reasoning framework and a two-stage post-training paradigm.

Key Capabilities & Innovations

Chain-of-Action-Thought (COAT) Reasoning: Utilizes special meta-action tokens (<|continue|>, <|reflect|>, <|explore|>) to guide the model's reasoning process, allowing it to build upon current trajectories, verify steps, or explore alternative solutions.
Two-Stage Training: Involves a small-scale format tuning (FT) stage using imitation learning to internalize the COAT format, followed by a large-scale self-improvement stage using Reinforcement Learning (RL) with "Restart and Explore" (RAE) techniques.
Iterative Self-improvement: Employs a kickstarting-inspired approach where knowledge from the teacher policy is distilled into the student model after each RL round, leading to continuous enhancement.

Performance Highlights

Mathematics Reasoning: Achieves state-of-the-art performance on various math benchmarks (GSM8K, MATH500, OlympiadBench, AMC2023, AIME2024), outperforming its base model, Qwen-2.5-Math-7B-Instruct, and other small-scale models.
General Domain Transferability: Despite being trained primarily on math datasets, Satori-7B-Round2 demonstrates strong transferability and competitive performance on diverse out-of-domain reasoning benchmarks, including logical reasoning (FOLIO, BGQA), code reasoning (CRUXEval), commonsense reasoning (StrategyQA), tabular reasoning (TableBench), and STEM subsets of MMLUPro.

Ideal Use Cases

Complex Mathematical Problem Solving: Excels in tasks requiring multi-step reasoning and self-correction.
Automated Reasoning Agents: Suitable for applications where an LLM needs to autonomously explore and refine solutions.
Educational Tools: Can be leveraged for generating detailed, step-by-step explanations for complex problems.

For more technical details, refer to the blog and research paper.

Overview

Satori-7B-Round2: Advanced Reasoning with Autoregressive Search

Key Capabilities & Innovations

Performance Highlights

Ideal Use Cases

Full Model Card (README)