Name: tomascooler/affine-wh0-5FzxcV9qRtCuZRic8PyD3Zv7JSzbzqDeRa3yB5d94bahmPuZ API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tomascooler

Ovis2.6-30B-A3B: Advanced Multimodal MoE Model

Ovis2.6-30B-A3B is the latest iteration in the Ovis series of Multimodal Large Language Models (MLLMs) from AIDC-AI. It significantly upgrades its LLM backbone to a Mixture-of-Experts (MoE) architecture, allowing it to scale to 30 billion total parameters while maintaining low serving costs with only ~3 billion active parameters during inference.

Key Capabilities

Enhanced Long-Sequence & High-Resolution Processing: Features an extended context window of 64K tokens and supports image resolutions up to 2880x2880. This is particularly beneficial for processing information-dense visual inputs and long-document question answering.
"Think with Image": Introduces an innovative capability where the model can actively invoke visual tools (e.g., cropping, rotation) to re-examine and analyze image regions within its Chain-of-Thought, enabling multi-turn, self-reflective reasoning for complex visual tasks.
Reinforced OCR, Document, and Chart Understanding: Excels at accurately extracting structured information from visual data and performing reasoning over extracted content, making it highly effective for information-dense visual tasks.

Good For

Applications requiring advanced multimodal understanding with efficient inference.
Tasks involving long documents, high-resolution images, and complex visual reasoning.
Use cases demanding robust OCR, document understanding, and chart/diagram analysis.

Overview

Ovis2.6-30B-A3B: Advanced Multimodal MoE Model

Key Capabilities

Good For

Full Model Card (README)