GAIR/daVinci-Dev-72B

TEXT GENERATIONConcurrency Cost:4Model Size:72.7BQuant:FP8Ctx Length:32kPublished:Jan 23, 2026License:qwenArchitecture:Transformer0.0K Cold

GAIR/daVinci-Dev-72B is a 72 billion parameter large language model developed by GAIR, specifically trained for agentic software engineering tasks. It utilizes agent-native mid-training and fine-tuning on contextually-native and environmentally-native trajectories to reduce the distribution mismatch between static pretraining corpora and dynamic coding environments. This model excels at complex software engineering challenges, achieving state-of-the-art performance on benchmarks like SWE-Bench Verified.

Loading preview...

Overview of daVinci-Dev-72B

daVinci-Dev-72B is a 72 billion parameter model from the daVinci-Dev family, developed by GAIR, focusing on agentic software engineering. It is built upon the Qwen2.5-Base architecture and undergoes a unique training methodology called agentic mid-training. This process incorporates agent-native data to bridge the gap between traditional pretraining data and the dynamic, feedback-rich environments encountered by real code agents.

Key Training Methodology

The model's training involves two primary types of trajectories:

  • Contextually-native trajectories (PR-derived): These are constructed from GitHub pull requests, preserving the full information flow from file discovery and context retrieval to sequential edits. This provides broad coverage and diversity in coding scenarios.
  • Environmentally-native trajectories (executable rollouts): Collected from real executable repositories, these trajectories capture authentic feedback loops from genuine tool and test outputs, including both passing and non-passing scenarios.

Performance and Capabilities

daVinci-Dev-72B demonstrates strong performance in software engineering tasks, achieving 58.5% Pass@1 on SWE-Bench Verified. This places it among the state-of-the-art for open training recipes within its model size, despite starting from a non-coder base model. The model also shows generalization gains on standard code benchmarks like HumanEval/EvalPlus and scientific reasoning benchmarks such as GPQA/SciBench.

Intended Use

This model is designed for use within agentic scaffolds like SWE-Agent for automated software development and bug fixing. It is also compatible with standard inference frameworks like Hugging Face Transformers and vLLM.