Overview
DeepSWE-Preview: An RL-Trained Coding Agent
DeepSWE-Preview is a 32 billion parameter coding agent developed by agentica-org, built upon the Qwen3-32B architecture. This model is uniquely trained using only reinforcement learning (RL) to specialize in software engineering (SWE) tasks, showcasing advanced reasoning for codebase navigation and multi-file editing. It achieves a notable 59.0% on SWE-Bench-Verified, positioning it as a top-performing open-weights model in this domain. The model's performance significantly improved by approximately 20% after just 200 steps of RL training.
Key Capabilities
- Reinforcement Learning (RL) Optimization: Trained exclusively with RL, enhancing its ability to solve complex SWE problems.
- High SWE-Bench Performance: Achieves 59.0% on SWE-Bench-Verified with test-time scaling, outperforming other open-source agents.
- Advanced Tool Use: Integrates with R2E-Gym's tools, including
Execute Bash,Search,File Editor, andFinish/Submit, for comprehensive interaction with development environments. - Efficient Training: Utilizes an enhanced GRPO algorithm with innovations like Clip High, No KL Loss, and Compact Filtering for stable and effective training.
Good for
- Automated Software Development: Ideal for tasks requiring an agent to understand, navigate, and modify codebases.
- Research in RL for LLMs: Provides a foundational model for exploring and advancing reinforcement learning applications in large language models, particularly for agentic behavior.
- Code Generation and Debugging: Excels in scenarios demanding robust code generation, problem-solving, and automated patch creation.