open-thoughts/OpenThinker-Agent-v1

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 5, 2025License:apache-2.0Architecture:Transformer0.1K Open Weights Cold

OpenThinker-Agent-v1 by OpenThoughts is an 8 billion parameter model post-trained from Qwen3-8B, specifically optimized for agentic tasks. It excels in environments like Terminal-Bench 2.0 and SWE-Bench, demonstrating strong performance in automated problem-solving and code-related challenges. The model was developed using a two-stage process involving supervised fine-tuning and reinforcement learning on curated datasets. It is designed for applications requiring autonomous task execution and robust agent capabilities.

Loading preview...

OpenThinker-Agent-v1 Overview

OpenThinker-Agent-v1, developed by OpenThoughts, is an 8 billion parameter model derived from Qwen3-8B, specifically engineered for agentic tasks. This model undergoes a two-stage training process: initial supervised fine-tuning (SFT) on the OpenThoughts-Agent-v1-SFT dataset, followed by reinforcement learning (RL) using the OpenThoughts-Agent-v1-RL dataset. The SFT dataset comprises approximately 15,200 traces from sources like nl2bash and InferredBugs, while the RL dataset contains around 720 tasks from nl2bash verified.

Key Capabilities and Performance

  • Agentic Task Specialization: Optimized for complex agentic environments such as Terminal-Bench 2.0 and SWE-Bench.
  • Enhanced Performance: Demonstrates significant improvements over its base model, Qwen3-8B, on agent benchmarks. For instance, it achieves 4.9 on Terminal-Bench 2.0 and 15.7 on SWE-Bench Verified, compared to Qwen3-8B's 0.0 and 0.7 respectively.
  • Robust Training Data: Utilizes meticulously curated datasets, including a three-stage filtration pipeline to ensure data quality and stability for training.

Ideal Use Cases

  • Automated Problem Solving: Suitable for applications requiring autonomous agents to solve problems in terminal or software development environments.
  • Code Generation and Debugging: Excels in tasks related to code understanding, generation, and debugging, as indicated by its performance on SWE-Bench.
  • Research and Development: A valuable resource for researchers and developers exploring advanced agentic AI systems and dataset curation techniques.