CodeScout-14B: State-of-the-Art Code Localization
CodeScout-14B, developed by OpenHands, is the strongest model in the CodeScout family, an open-source series of reinforcement learning (RL)-trained code search agents. This 14 billion parameter model is specifically designed for repository-level code localization, identifying relevant files, classes, and functions within a codebase based on a GitHub issue description.
Key Capabilities & Differentiators
- Open-source SOTA Performance: Achieves state-of-the-art results on SWE-Bench Verified, Pro, and Lite for code localization, outperforming models 2–18 times larger.
- Terminal-Based Operation: Operates using only a standard Unix terminal and commands (
rg, find, grep, ls), without relying on static analysis or language-specific tooling. - RL-Trained: Fine-tuned from
Qwen3-14B using Group Sequence Policy Optimization (GSPO) with multi-level F1 rewards. - Python-Focused: Trained and evaluated exclusively on Python repositories.
Intended Use Cases
- Code Localization: Ideal for pinpointing where code modifications are needed within a repository.
- Subagent in Coding Pipelines: Designed to function as a localization subagent within broader automated coding workflows.
Limitations
- Exclusively trained and evaluated on Python repositories.
- Focuses solely on code localization, not code editing or full issue resolution.
- Requires the OpenHands-Bash scaffold for optimal performance.