QuixiAI/Devstral-Vision-Small-2507
Overview
Devstral-Vision-Small-2507 Overview
Devstral-Vision-Small-2507 is a multimodal language model developed by Eric Hartford at Quixi AI, integrating the robust coding prowess of Devstral-Small-2507 with the visual comprehension of Mistral-Small-3.2-24B-Instruct-2506. This 24 billion parameter model features a 128k token context window and is engineered for advanced software engineering tasks that require visual context.
Key Capabilities
- Vision-Augmented Coding: Analyzes screenshots, UI mockups, and designs to generate and modify code.
- Debugging with Visuals: Facilitates debugging of visual rendering issues by interpreting actual screenshots.
- Design-to-Code Conversion: Converts visual designs and wireframes directly into implementation code.
- Superior Coding Performance: Inherits Devstral's strong performance on coding tasks, including multi-file editing and codebase exploration, achieving 53.6% on SWE-Bench Verified when used with OpenHands.
- Robust Vision Understanding: Maintains Mistral-Small's capabilities in interpreting UI elements, layouts, charts, and diagrams.
Good for
- Visual Software Engineering: Ideal for tasks like building UI components from screenshots or converting design mockups to code.
- Code Review with Visual Context: Reviewing code changes alongside their visual output.
- Debugging Visual Issues: Pinpointing and resolving rendering problems using visual feedback.
- Agentic Coding Tasks: Optimized for use with frameworks like OpenHands for automated development workflows.
This model was created by surgically replacing the language model weights of Mistral-Small-3.2-24B-Instruct-2506 with those from Devstral-Small-2507, while preserving all vision components. It requires approximately 48GB of GPU memory for full precision or 24GB with 4-bit quantization.