open-thoughts/OpenThinkerAgent-8B-ColdStartSFTForRL
OpenThinkerAgent-8B-ColdStartSFTForRL by OpenThoughts is an 8 billion parameter Qwen3-based model with a 40,960-token context length. It serves as the cold-start, pre-RL base for agentic models, fine-tuned with full-parameter SFT on agentic interaction formats and tool-use behaviors. This model is specifically designed to provide a stable starting point for subsequent reinforcement learning in agent development, rather than being a final deployable agent.
Loading preview...
OpenThinkerAgent-8B-ColdStartSFTForRL Overview
This model, developed by OpenThoughts, is a crucial component in their open-source effort to train agentic models. It is an 8 billion parameter model based on the Qwen3 architecture, featuring a substantial 40,960-token context length. Its primary role is to act as the cold-start, pre-RL base within a larger SFT (Supervised Fine-Tuning) to RL (Reinforcement Learning) pipeline.
Key Capabilities & Training
- Agentic Interaction Foundation: The model is fine-tuned using full-parameter SFT on the
OpenThoughts-Agent-SFT-ColdStartForRL-10Kdataset. This process imbues it with the necessary agentic interaction format and tool-use behaviors, which are critical for stabilizing subsequent reinforcement learning. - SWE-Smith Tasks: The training data consists of nearly 10,000 sandboxed coding tasks with tests, solved by a teacher model and oracle-verified, focusing on software engineering agent capabilities.
- High Context Window: Built on Qwen3-8B, it inherits a large context window of 40,960 tokens, enabling it to process extensive interactions.
Intended Use and Limitations
This OpenThinkerAgent-8B-ColdStartSFTForRL checkpoint is not intended as a standalone, deployable agent. Instead, it is specifically designed as the starting point for agentic RL. Its standalone performance is expected to be lower than its RL-trained successor, OpenThinkerAgent-8B-RL. Users should be aware that outputs may be incorrect or unsafe and require review, as no standalone agentic benchmark numbers are published for this cold-start version.