open-thoughts/OpenThinker-Agent-v1

Cold
Public
8B
FP8
32768
License: apache-2.0
Hugging Face
Overview

OpenThinker-Agent-v1: Agentic Task Specialist

OpenThinker-Agent-v1 is an 8 billion parameter model developed by open-thoughts, specifically engineered for advanced agentic tasks. Built upon the Qwen3-8B architecture, this model undergoes a rigorous two-stage training process: supervised fine-tuning (SFT) followed by reinforcement learning (RL).

Key Capabilities & Performance

This model demonstrates state-of-the-art performance at its scale on critical agent benchmarks. It is particularly adept at:

  • Terminal-Bench 2.0: Achieves a score of 4.9, significantly outperforming the base Qwen3-8B (0.0) and even Qwen3-32B (1.9).
  • SWE-Bench Verified: Scores 15.7, a substantial improvement over Qwen3-8B (0.7) and Qwen3-32B (5.7).
  • OpenThoughts-TB-Dev: Reaches 17.3, surpassing Qwen3-8B (5.7) and Qwen3-32B (10.2).

Training Methodology

The model's robust performance stems from its unique training pipeline:

  • Supervised Fine-Tuning (SFT): Utilizes the OpenThoughts-Agent-v1-SFT dataset, comprising approximately 15,200 traces from nl2bash (shell command formatting) and InferredBugs (C# and Java bug-fixing tasks).
  • Reinforcement Learning (RL): Further refined using the OpenThoughts-Agent-v1-RL dataset, which includes around 720 tasks from the nl2bash verified dataset.

Data Filtration

To ensure training stability and quality, a three-stage filtration pipeline prunes tasks by removing those with flaky/slow verifiers, unstable environments, or excessive difficulty for strong models like GPT-5 Codex.

Ideal Use Cases

This model is particularly well-suited for applications requiring autonomous agents capable of interacting with terminal environments, resolving software bugs, and executing complex multi-step instructions.