Name: VendaDi/Qwen3-VL-8B-Instruct-Automingo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: VendaDi

VendaDi/Qwen3-VL-8B-Instruct-Automingo: Driving VQA Specialist

This model is a fine-tuned 8 billion parameter vision-language model, built upon Qwen3-VL-8B-Instruct, specifically designed for safety-critical driving Visual Question Answering (VQA). It leverages Supervised Fine-Tuning (SFT) with LoRA on the unique Automingo-VQA dataset, which comprises 6,565 images and 5,792 question-answer pairs focused on 5-frame temporal snippets of critical driving events.

Key Capabilities

Structured Reasoning: Trained to answer complex, structured questions related to driving scenarios, emphasizing safety-critical interpretation.
Temporal Understanding: Processes short temporal image sequences to understand dynamic events like cut-ins, traffic light transitions, and vehicle interactions.
High Accuracy in Driving VQA: Achieves an 89.3% MCQ accuracy on the Automingo benchmark, representing a +7.8% absolute gain over the base Qwen3-VL-8B model.
Reduced Invalid Outputs: Designed to minimize invalid or non-actionable responses, crucial for ADAS applications.

Ideal Use Cases

Advanced Driver-Assistance Systems (ADAS): Provides specialized reasoning for ADAS-style tasks, interpreting real-time driving situations.
Automotive Safety Research: Useful for analyzing and understanding safety-critical events in driving footage.
Autonomous Driving Development: Can contribute to the perception and decision-making layers by offering structured interpretations of visual data.

While demonstrating strong performance in cut-in scenarios and leading vehicle interactions, the model notes remaining challenges in complex intersections and roundabouts.

Overview

VendaDi/Qwen3-VL-8B-Instruct-Automingo: Driving VQA Specialist

Key Capabilities

Ideal Use Cases

Full Model Card (README)