OpenThinker-Agent-v1: Agentic Task Specialist
OpenThinker-Agent-v1 is an 8 billion parameter model developed by open-thoughts, specifically engineered for advanced agentic tasks. Built upon the Qwen3-8B architecture, this model undergoes a rigorous two-stage training process: supervised fine-tuning (SFT) followed by reinforcement learning (RL).
Key Capabilities & Performance
This model demonstrates state-of-the-art performance at its scale on critical agent benchmarks. It is particularly adept at:
- Terminal-Bench 2.0: Achieves a score of 4.9, significantly outperforming the base Qwen3-8B (0.0) and even Qwen3-32B (1.9).
- SWE-Bench Verified: Scores 15.7, a substantial improvement over Qwen3-8B (0.7) and Qwen3-32B (5.7).
- OpenThoughts-TB-Dev: Reaches 17.3, surpassing Qwen3-8B (5.7) and Qwen3-32B (10.2).
Training Methodology
The model's robust performance stems from its unique training pipeline:
- Supervised Fine-Tuning (SFT): Utilizes the OpenThoughts-Agent-v1-SFT dataset, comprising approximately 15,200 traces from
nl2bash(shell command formatting) andInferredBugs(C# and Java bug-fixing tasks). - Reinforcement Learning (RL): Further refined using the OpenThoughts-Agent-v1-RL dataset, which includes around 720 tasks from the
nl2bash verifieddataset.
Data Filtration
To ensure training stability and quality, a three-stage filtration pipeline prunes tasks by removing those with flaky/slow verifiers, unstable environments, or excessive difficulty for strong models like GPT-5 Codex.
Ideal Use Cases
This model is particularly well-suited for applications requiring autonomous agents capable of interacting with terminal environments, resolving software bugs, and executing complex multi-step instructions.