agentica-org/DeepSWE-Preview

Warm
Public
32B
FP8
32768
License: mit
Hugging Face
Overview

DeepSWE-Preview: An RL-Trained Coding Agent

DeepSWE-Preview is a 32 billion parameter coding agent developed by agentica-org, built upon the Qwen3-32B architecture. This model is uniquely trained using only reinforcement learning (RL) to specialize in software engineering (SWE) tasks, showcasing advanced reasoning for codebase navigation and multi-file editing. It achieves a notable 59.0% on SWE-Bench-Verified, positioning it as a top-performing open-weights model in this domain. The model's performance significantly improved by approximately 20% after just 200 steps of RL training.

Key Capabilities

  • Reinforcement Learning (RL) Optimization: Trained exclusively with RL, enhancing its ability to solve complex SWE problems.
  • High SWE-Bench Performance: Achieves 59.0% on SWE-Bench-Verified with test-time scaling, outperforming other open-source agents.
  • Advanced Tool Use: Integrates with R2E-Gym's tools, including Execute Bash, Search, File Editor, and Finish/Submit, for comprehensive interaction with development environments.
  • Efficient Training: Utilizes an enhanced GRPO algorithm with innovations like Clip High, No KL Loss, and Compact Filtering for stable and effective training.

Good for

  • Automated Software Development: Ideal for tasks requiring an agent to understand, navigate, and modify codebases.
  • Research in RL for LLMs: Provides a foundational model for exploring and advancing reinforcement learning applications in large language models, particularly for agentic behavior.
  • Code Generation and Debugging: Excels in scenarios demanding robust code generation, problem-solving, and automated patch creation.