CodeScout-1.7B-RFT: A Specialized Code Localization Agent
CodeScout-1.7B-RFT is a 1.7 billion parameter model developed by OpenHands, serving as a crucial intermediate checkpoint within the CodeScout family of open-source RL-trained code search agents. This model is specifically designed for repository-level code localization, identifying relevant files, classes, and functions within a codebase based on a given issue description.
Key Capabilities & Features
- Rejection Fine-Tuned (RFT): This model is a warm-start checkpoint, distilled from the more powerful CodeScout-14B's expert trajectories using rejection sampling, prior to its final Reinforcement Learning (RL) stage.
- Terminal-Based Operation: CodeScout models operate using only a standard Unix terminal, leveraging commands like
rg, find, grep, and ls for navigation and search, without relying on static analysis or language-specific tooling. - SWE-Bench Performance: While an intermediate checkpoint, the CodeScout family demonstrates competitive F1 scores on SWE-Bench for file and function-level localization, with larger CodeScout models achieving up to 68.57 File F1 and 40.32 Func F1 on SWE-Bench Verified.
- Training Details: It was trained by rejection fine-tuning
Qwen3-1.7B on 4K filtered trajectories with perfect scores from CodeScout-14B, using the veRL framework.
Intended Use Cases
- RL Research: Ideal for researchers investigating the impact of Rejection Fine-Tuning (RFT) versus Reinforcement Learning (RL) in agent training pipelines.
- Custom RL Experiments: Can serve as a robust base model for developing and experimenting with custom RL approaches for code search.
- Localization Subagent: Designed to function as a localization component within broader coding agent pipelines, focusing solely on identifying relevant code sections rather than editing or issue resolution.
Limitations
- Currently trained and evaluated exclusively on Python repositories.
- Focused on code localization only, not code editing or full issue resolution.
- Requires the OpenHands-Bash scaffold for optimal performance.