daVinci-Dev-32B-MT: Agent-Native Mid-Training for Software Engineering
GAIR/daVinci-Dev-32B-MT is a 32 billion parameter model from the daVinci-Dev family, developed by GAIR. It is a mid-training (MT) checkpoint, meaning it has undergone agent-native mid-training but has not yet received Supervised Fine-Tuning (SFT) on environmentally-native executable trajectories. The model is built upon the Qwen2.5 base model family, specifically adapted for agentic software engineering.
Key Capabilities & Training:
- Agent-Native Mid-Training: Utilizes a novel approach to reduce the distribution mismatch between static pretraining data and dynamic, feedback-rich environments encountered by code agents.
- Specialized Data: Trained on two complementary trajectory types:
- Contextually-native trajectories (68.6B tokens): Derived from GitHub pull requests, preserving full information flow from file discovery to sequential edits.
- Environmentally-native executable trajectories (3.1B raw tokens): Collected from real executable repositories with genuine tool and test outputs, capturing authentic feedback loops.
- Software Engineering Focus: Designed to excel in agentic software engineering tasks, including automated code generation, bug fixing, and reasoning within development environments.
Performance & Use Cases:
- SWE-Bench Verified: The
daVinci-Dev-32B (which includes SFT on top of this MT checkpoint) achieves 56.1% Pass@1 on SWE-Bench Verified, demonstrating strong capabilities in solving real-world software issues. - Generalization: Improvements are also observed on standard code benchmarks (e.g., HumanEval/EvalPlus) and scientific reasoning benchmarks (e.g., GPQA/SciBench).
- Intended Use: Primarily designed for integration into agentic scaffolds like SWE-Agent for automated software development workflows. It is compatible with standard inference frameworks like Hugging Face Transformers and vLLM.