OpenSWE-32B: Software Engineering Agent Training
OpenSWE-32B is a 32.8 billion parameter model from GAIR, specifically designed for software engineering (SWE) tasks. It is trained on the OpenSWE dataset, the largest fully transparent framework for SWE agent training in Python, featuring 45,320 executable Docker environments across 12.8k repositories. This framework emphasizes reproducibility, with all Dockerfiles, evaluation scripts, and infrastructure open-sourced.
Key Capabilities and Differentiators
- Unprecedented Scale and Transparency: OpenSWE provides a massive dataset of real-world software environments, built with a multi-agent synthesis pipeline and a significant investment in construction and curation.
- Quality-Centric Filtering: The dataset undergoes a rigorous, difficulty-aware curation process, filtering out unsolvable or trivially simple instances to maximize learning efficiency for the model.
- State-of-the-Art Performance: OpenSWE-32B achieves 62.4% on SWE-bench Verified, setting a new benchmark among SFT-based methods in the Qwen2.5 series. Models trained on OpenSWE consistently outperform alternatives like SWE-rebench across various scales.
- Generalization: Training on OpenSWE not only improves SWE-specific performance but also yields substantial out-of-domain improvements, including gains on math (e.g., +12 points on MATH-500) and science benchmarks, without degrading factual recall.
When to Use OpenSWE-32B
- Automated Software Development: Ideal for tasks requiring agents to interact with and resolve issues within complex software environments.
- Code Generation and Bug Fixing: Excels in scenarios demanding high accuracy in understanding and modifying codebases.
- Research and Development: Provides a fully transparent and reproducible framework for advancing research in software engineering AI.