GAIR/daVinci-Dev-32B-MT

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Jan 25, 2026License:qwenArchitecture:Transformer0.0K Cold

GAIR/daVinci-Dev-32B-MT is a 32 billion parameter large language model from the GAIR daVinci-Dev family, based on Qwen2.5. This checkpoint represents the model after agent-native mid-training, specifically designed for agentic software engineering tasks. It utilizes agent-native data, including contextually-native PR-derived trajectories and environmentally-native executable rollouts, to reduce distribution mismatch for code agents. The model demonstrates strong performance on software engineering benchmarks like SWE-Bench Verified, making it suitable for automated code generation and bug fixing within agentic frameworks.

Loading preview...

daVinci-Dev-32B-MT: Agent-Native Mid-Training for Software Engineering

GAIR/daVinci-Dev-32B-MT is a 32 billion parameter model from the daVinci-Dev family, developed by GAIR. It is a mid-training (MT) checkpoint, meaning it has undergone agent-native mid-training but has not yet received Supervised Fine-Tuning (SFT) on environmentally-native executable trajectories. The model is built upon the Qwen2.5 base model family, specifically adapted for agentic software engineering.

Key Capabilities & Training:

  • Agent-Native Mid-Training: Utilizes a novel approach to reduce the distribution mismatch between static pretraining data and dynamic, feedback-rich environments encountered by code agents.
  • Specialized Data: Trained on two complementary trajectory types:
    • Contextually-native trajectories (68.6B tokens): Derived from GitHub pull requests, preserving full information flow from file discovery to sequential edits.
    • Environmentally-native executable trajectories (3.1B raw tokens): Collected from real executable repositories with genuine tool and test outputs, capturing authentic feedback loops.
  • Software Engineering Focus: Designed to excel in agentic software engineering tasks, including automated code generation, bug fixing, and reasoning within development environments.

Performance & Use Cases:

  • SWE-Bench Verified: The daVinci-Dev-32B (which includes SFT on top of this MT checkpoint) achieves 56.1% Pass@1 on SWE-Bench Verified, demonstrating strong capabilities in solving real-world software issues.
  • Generalization: Improvements are also observed on standard code benchmarks (e.g., HumanEval/EvalPlus) and scientific reasoning benchmarks (e.g., GPQA/SciBench).
  • Intended Use: Primarily designed for integration into agentic scaffolds like SWE-Agent for automated software development workflows. It is compatible with standard inference frameworks like Hugging Face Transformers and vLLM.