QuixiAI/Devstral-Vision-Small-2507

Warm
Public
Vision
24B
FP8
32768
2
Jul 11, 2025
License: apache-2.0
Hugging Face
Overview

Devstral-Vision-Small-2507 Overview

Devstral-Vision-Small-2507 is a multimodal language model developed by Eric Hartford at Quixi AI, integrating the robust coding prowess of Devstral-Small-2507 with the visual comprehension of Mistral-Small-3.2-24B-Instruct-2506. This 24 billion parameter model features a 128k token context window and is engineered for advanced software engineering tasks that require visual context.

Key Capabilities

  • Vision-Augmented Coding: Analyzes screenshots, UI mockups, and designs to generate and modify code.
  • Debugging with Visuals: Facilitates debugging of visual rendering issues by interpreting actual screenshots.
  • Design-to-Code Conversion: Converts visual designs and wireframes directly into implementation code.
  • Superior Coding Performance: Inherits Devstral's strong performance on coding tasks, including multi-file editing and codebase exploration, achieving 53.6% on SWE-Bench Verified when used with OpenHands.
  • Robust Vision Understanding: Maintains Mistral-Small's capabilities in interpreting UI elements, layouts, charts, and diagrams.

Good for

  • Visual Software Engineering: Ideal for tasks like building UI components from screenshots or converting design mockups to code.
  • Code Review with Visual Context: Reviewing code changes alongside their visual output.
  • Debugging Visual Issues: Pinpointing and resolving rendering problems using visual feedback.
  • Agentic Coding Tasks: Optimized for use with frameworks like OpenHands for automated development workflows.

This model was created by surgically replacing the language model weights of Mistral-Small-3.2-24B-Instruct-2506 with those from Devstral-Small-2507, while preserving all vision components. It requires approximately 48GB of GPU memory for full precision or 24GB with 4-bit quantization.