Satori-SFT-7B: A Reasoning-Focused 7.6B Parameter Model
Satori-reasoning/Satori-SFT-7B is a 7.6 billion parameter supervised fine-tuning (SFT) model, developed by Satori-reasoning. It serves as the foundational SFT checkpoint for their more advanced Satori-7B-Round2 reinforcement learning (RL) model. The core innovation of Satori-SFT-7B lies in its specialized training with a small-scale format tuning (FT) stage, which enables the base large language model to internalize the COAT (Chain-of-Action-Thought) reasoning format.
Key Capabilities & Features
- COAT Reasoning Format: The model is specifically trained to understand and utilize the Chain-of-Action-Thought reasoning format, which structures problem-solving into explicit steps.
- Foundation for RL Models: It acts as a crucial stepping stone for further reinforcement learning, providing a strong base for more complex reasoning tasks.
- Mathematical Problem Solving: The provided usage example highlights its application in solving mathematical problems efficiently and clearly, emphasizing step-by-step reasoning.
- Special Token Handling: The model's sampling parameters allow for skipping special tokens like "<|continue|>", "<|reflect|>", and "<|explore|>", indicating an internal mechanism for structured thought processes.
Training Data & Resources
Satori-SFT-7B was trained using a dedicated Full format tuning dataset comprising 300,000 unique questions. Further details on the Satori project, including its research paper and blog, are available for technical insights.
Good For
- Developing advanced reasoning models: Ideal as a base for further fine-tuning or reinforcement learning experiments focused on structured reasoning.
- Applications requiring step-by-step problem-solving: Particularly suited for tasks that benefit from explicit, verifiable reasoning paths, such as mathematics or logical puzzles.
- Research into Chain-of-Thought methodologies: Provides a practical implementation of format tuning for reasoning enhancement.