sailing-lab/SR2AM-v0.1-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 19, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

SR2AM-v0.1-8B by sailing-lab is an 8 billion parameter Self-Regulated Simulative Reasoning Agentic LLM with a 32768 token context length. This model is designed for efficient agentic reasoning by decomposing deliberation into reactive execution, simulative reasoning, and self-regulation. It achieves a Pass@1 of 57.0 across 11 benchmarks, demonstrating competitive performance with much larger models in math, science, tabular analysis, and web information seeking tasks.

Loading preview...

SR²AM-v0.1-8B: Self-Regulated Simulative Reasoning Agentic LLM

SR²AM-v0.1-8B is an 8 billion parameter language model developed by sailing-lab, designed to enhance agentic reasoning through a novel three-system decomposition. It integrates reactive execution (System I), simulative reasoning (System II) via an internal world model, and self-regulation (System III) managed by a learned configurator. This architecture allows the model to decide when and how deeply to plan, optimizing its reasoning process.

Key Capabilities and Features

  • System I + II + III Decomposition: Employs a configurator to dynamically decide planning depth, a simulative planner for constructing future-state-grounded plans, and reactive execution for fine-grained reasoning and tool use.
  • SFT + RL Training: Utilizes supervised learning on data encoding the self-regulated planning structure, followed by reinforcement learning (GRPO) to optimize for task success.
  • Agentic Tool Use: Supports web search (SerpAPI), web browsing with LLM summarization, and stateless Python code execution (SandboxFusion).
  • Compact and Efficient: Achieves an overall Pass@1 of 57.0 across 11 diverse benchmarks, including math, science, tabular analysis, and web information seeking. This performance is competitive with systems ranging from 120B to 355B parameters, while maintaining a compact size and efficient reasoning token usage (averaging 3,698 tokens per trajectory).

When to Use This Model

SR²AM-v0.1-8B is particularly well-suited for applications requiring complex, multi-step reasoning and agentic capabilities, especially where efficiency and performance at a smaller scale are critical. Its strengths lie in tasks that benefit from structured planning, self-regulation, and tool integration, such as:

  • Automated problem-solving in math and science domains.
  • Information retrieval and synthesis from the web.
  • Tasks requiring dynamic decision-making on planning depth.

For more details, refer to the project website and the research paper.