sailing-lab/SR2AM-v0.1-8B
SR2AM-v0.1-8B by sailing-lab is an 8 billion parameter Self-Regulated Simulative Reasoning Agentic LLM with a 32768 token context length. This model is designed for efficient agentic reasoning by decomposing deliberation into reactive execution, simulative reasoning, and self-regulation. It achieves a Pass@1 of 57.0 across 11 benchmarks, demonstrating competitive performance with much larger models in math, science, tabular analysis, and web information seeking tasks.
Loading preview...
SR²AM-v0.1-8B: Self-Regulated Simulative Reasoning Agentic LLM
SR²AM-v0.1-8B is an 8 billion parameter language model developed by sailing-lab, designed to enhance agentic reasoning through a novel three-system decomposition. It integrates reactive execution (System I), simulative reasoning (System II) via an internal world model, and self-regulation (System III) managed by a learned configurator. This architecture allows the model to decide when and how deeply to plan, optimizing its reasoning process.
Key Capabilities and Features
- System I + II + III Decomposition: Employs a configurator to dynamically decide planning depth, a simulative planner for constructing future-state-grounded plans, and reactive execution for fine-grained reasoning and tool use.
- SFT + RL Training: Utilizes supervised learning on data encoding the self-regulated planning structure, followed by reinforcement learning (GRPO) to optimize for task success.
- Agentic Tool Use: Supports web search (SerpAPI), web browsing with LLM summarization, and stateless Python code execution (SandboxFusion).
- Compact and Efficient: Achieves an overall Pass@1 of 57.0 across 11 diverse benchmarks, including math, science, tabular analysis, and web information seeking. This performance is competitive with systems ranging from 120B to 355B parameters, while maintaining a compact size and efficient reasoning token usage (averaging 3,698 tokens per trajectory).
When to Use This Model
SR²AM-v0.1-8B is particularly well-suited for applications requiring complex, multi-step reasoning and agentic capabilities, especially where efficiency and performance at a smaller scale are critical. Its strengths lie in tasks that benefit from structured planning, self-regulation, and tool integration, such as:
- Automated problem-solving in math and science domains.
- Information retrieval and synthesis from the web.
- Tasks requiring dynamic decision-making on planning depth.
For more details, refer to the project website and the research paper.