OpenThinkerAgent-8B-ColdStartSFTForRL Overview

This model, developed by OpenThoughts, is a crucial component in their open-source effort to train agentic models. It is an 8 billion parameter model based on the Qwen3 architecture, featuring a substantial 40,960-token context length. Its primary role is to act as the cold-start, pre-RL base within a larger SFT (Supervised Fine-Tuning) to RL (Reinforcement Learning) pipeline.

Key Capabilities & Training

Agentic Interaction Foundation: The model is fine-tuned using full-parameter SFT on the OpenThoughts-Agent-SFT-ColdStartForRL-10K dataset. This process imbues it with the necessary agentic interaction format and tool-use behaviors, which are critical for stabilizing subsequent reinforcement learning.
SWE-Smith Tasks: The training data consists of nearly 10,000 sandboxed coding tasks with tests, solved by a teacher model and oracle-verified, focusing on software engineering agent capabilities.
High Context Window: Built on Qwen3-8B, it inherits a large context window of 40,960 tokens, enabling it to process extensive interactions.

Intended Use and Limitations

This OpenThinkerAgent-8B-ColdStartSFTForRL checkpoint is not intended as a standalone, deployable agent. Instead, it is specifically designed as the starting point for agentic RL. Its standalone performance is expected to be lower than its RL-trained successor, OpenThinkerAgent-8B-RL. Users should be aware that outputs may be incorrect or unsafe and require review, as no standalone agentic benchmark numbers are published for this cold-start version.

Overview

OpenThinkerAgent-8B-ColdStartSFTForRL Overview

Key Capabilities & Training

Intended Use and Limitations

Full Model Card (README)