VendaDi/Qwen3-VL-8B-Instruct-Automingo
VendaDi/Qwen3-VL-8B-Instruct-Automingo is an 8 billion parameter vision-language model fine-tuned from Qwen3-VL-8B-Instruct. Developed as part of the Automingo project, it specializes in safety-critical driving Visual Question Answering (VQA) by performing structured, scenario-based reasoning over short temporal image sequences. This model excels at interpreting complex driving events and providing actionable insights for Advanced Driver-Assistance Systems (ADAS)-style reasoning tasks.
Loading preview...
VendaDi/Qwen3-VL-8B-Instruct-Automingo: Driving VQA Specialist
This model is a fine-tuned 8 billion parameter vision-language model, built upon Qwen3-VL-8B-Instruct, specifically designed for safety-critical driving Visual Question Answering (VQA). It leverages Supervised Fine-Tuning (SFT) with LoRA on the unique Automingo-VQA dataset, which comprises 6,565 images and 5,792 question-answer pairs focused on 5-frame temporal snippets of critical driving events.
Key Capabilities
- Structured Reasoning: Trained to answer complex, structured questions related to driving scenarios, emphasizing safety-critical interpretation.
- Temporal Understanding: Processes short temporal image sequences to understand dynamic events like cut-ins, traffic light transitions, and vehicle interactions.
- High Accuracy in Driving VQA: Achieves an 89.3% MCQ accuracy on the Automingo benchmark, representing a +7.8% absolute gain over the base Qwen3-VL-8B model.
- Reduced Invalid Outputs: Designed to minimize invalid or non-actionable responses, crucial for ADAS applications.
Ideal Use Cases
- Advanced Driver-Assistance Systems (ADAS): Provides specialized reasoning for ADAS-style tasks, interpreting real-time driving situations.
- Automotive Safety Research: Useful for analyzing and understanding safety-critical events in driving footage.
- Autonomous Driving Development: Can contribute to the perception and decision-making layers by offering structured interpretations of visual data.
While demonstrating strong performance in cut-in scenarios and leading vehicle interactions, the model notes remaining challenges in complex intersections and roundabouts.