open-thoughts/OpenThinkerAgent-8B-ColdStartSFTForRL

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 9, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

OpenThinkerAgent-8B-ColdStartSFTForRL by OpenThoughts is an 8 billion parameter Qwen3-based model with a 40,960-token context length. It serves as the cold-start, pre-RL base for agentic models, fine-tuned with full-parameter SFT on agentic interaction formats and tool-use behaviors. This model is specifically designed to provide a stable starting point for subsequent reinforcement learning in agent development, rather than being a final deployable agent.

Loading preview...

OpenThinkerAgent-8B-ColdStartSFTForRL Overview

This model, developed by OpenThoughts, is a crucial component in their open-source effort to train agentic models. It is an 8 billion parameter model based on the Qwen3 architecture, featuring a substantial 40,960-token context length. Its primary role is to act as the cold-start, pre-RL base within a larger SFT (Supervised Fine-Tuning) to RL (Reinforcement Learning) pipeline.

Key Capabilities & Training

  • Agentic Interaction Foundation: The model is fine-tuned using full-parameter SFT on the OpenThoughts-Agent-SFT-ColdStartForRL-10K dataset. This process imbues it with the necessary agentic interaction format and tool-use behaviors, which are critical for stabilizing subsequent reinforcement learning.
  • SWE-Smith Tasks: The training data consists of nearly 10,000 sandboxed coding tasks with tests, solved by a teacher model and oracle-verified, focusing on software engineering agent capabilities.
  • High Context Window: Built on Qwen3-8B, it inherits a large context window of 40,960 tokens, enabling it to process extensive interactions.

Intended Use and Limitations

This OpenThinkerAgent-8B-ColdStartSFTForRL checkpoint is not intended as a standalone, deployable agent. Instead, it is specifically designed as the starting point for agentic RL. Its standalone performance is expected to be lower than its RL-trained successor, OpenThinkerAgent-8B-RL. Users should be aware that outputs may be incorrect or unsafe and require review, as no standalone agentic benchmark numbers are published for this cold-start version.