Name: Qwen/Qwen2.5-VL-72B-Instruct API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: Qwen

Overview

Qwen2.5-VL-72B-Instruct is the latest 72 billion parameter instruction-tuned vision-language model from the Qwen family, building upon feedback from its predecessor, Qwen2-VL. This model introduces significant enhancements in visual and video understanding, making it a powerful tool for complex multimodal tasks.

Key Capabilities

Advanced Visual Understanding: Proficient in recognizing common objects and analyzing intricate visual elements like texts, charts, icons, graphics, and layouts within images.
Visual Agency: Functions as a visual agent capable of reasoning and dynamically directing tools for computer and phone interactions.
Long Video Comprehension: Can understand videos exceeding one hour in duration, with a new ability to pinpoint relevant events within video segments.
Precise Visual Localization: Accurately localizes objects in images using bounding boxes or points, providing stable JSON outputs for coordinates and attributes.
Structured Output Generation: Supports structured outputs for data from invoices, forms, and tables, beneficial for financial and commercial applications.

Model Architecture Updates

Dynamic Resolution and Frame Rate Training: Extends dynamic resolution to the temporal dimension using dynamic FPS sampling, enhancing video comprehension across various sampling rates. This includes updated mRoPE for temporal sequence and speed learning.
Efficient Vision Encoder: Improves training and inference speeds through window attention in the ViT, further optimized with SwiGLU and RMSNorm to align with the Qwen2.5 LLM structure.

Performance

Evaluations show Qwen2.5-VL-72B-Instruct achieving competitive or leading scores across various image, video, and agent benchmarks, including MMMU, MathVista, DocVQA, VideoMME, and ScreenSpot, often outperforming previous Qwen-VL versions and other leading models like GPT4o and Claude3.5 Sonnet in specific tasks.

Overview

Overview

Key Capabilities

Model Architecture Updates

Performance

Full Model Card (README)