OpenThinkerAgent-8B-RL: An Agentic Coding Model

OpenThinkerAgent-8B-RL is an 8 billion parameter model developed by OpenThoughts, representing the final, RL-trained checkpoint in their SFT→RL recipe for agentic models. Built upon a Qwen3-8B architecture, this model was initially fine-tuned with supervised fine-tuning (SFT) using the OpenThoughts-Agent-SFT-ColdStartForRL-10K dataset, and subsequently enhanced through on-policy Reinforcement Learning (RL) on the OpenThoughts-Agent-RL-5K task set, reaching RL step 45.

Key Capabilities

Agentic Coding: Designed to operate as a tool-using agent, capable of issuing shell commands and edits, and reasoning over terminal output to solve software engineering tasks.
Qwen3 Architecture: Inherits general language capabilities from its Qwen3-8B base, featuring 36 layers, a hidden size of 4096, and a 40,960-token context length.
RL-Optimized: Specifically optimized for agentic behavior through a rigorous RL training procedure, including RLOO-n advantage estimation and PPO clipping.

Good For

Software Engineering Tasks: Ideal for applications requiring an AI agent to interact with development environments, execute code, and debug.
Tool-Using Agents: Suitable for integration into systems where the model needs to leverage external tools and interpret their outputs.

It's important to note that while designed for agentic coding, outputs (including shell commands) may require review and should be executed in sandboxed environments. Evaluation results for this specific 8B RL checkpoint are currently pending.

Overview

OpenThinkerAgent-8B-RL: An Agentic Coding Model

Key Capabilities

Good For

Full Model Card (README)