tomascooler/affine-wh0-5FzxcV9qRtCuZRic8PyD3Zv7JSzbzqDeRa3yB5d94bahmPuZ

TEXT GENERATIONConcurrency Cost:1Model Size:14BQuant:FP8Ctx Length:32kPublished:Jan 20, 2026Architecture:Transformer Cold

Ovis2.6-30B-A3B by AIDC-AI is a Multimodal Large Language Model (MLLM) featuring a Mixture-of-Experts (MoE) architecture with 30 billion total parameters and approximately 3 billion active parameters during inference. It extends context to 64K tokens and supports image resolutions up to 2880x2880, enhancing long-document Q&A and high-resolution visual processing. This model introduces "Think with Image" for active visual reasoning and significantly reinforces OCR, document understanding, and chart analysis capabilities.

Loading preview...

Ovis2.6-30B-A3B: Advanced Multimodal MoE Model

Ovis2.6-30B-A3B is the latest iteration in the Ovis series of Multimodal Large Language Models (MLLMs) from AIDC-AI. It significantly upgrades its LLM backbone to a Mixture-of-Experts (MoE) architecture, allowing it to scale to 30 billion total parameters while maintaining low serving costs with only ~3 billion active parameters during inference.

Key Capabilities

  • Enhanced Long-Sequence & High-Resolution Processing: Features an extended context window of 64K tokens and supports image resolutions up to 2880x2880. This is particularly beneficial for processing information-dense visual inputs and long-document question answering.
  • "Think with Image": Introduces an innovative capability where the model can actively invoke visual tools (e.g., cropping, rotation) to re-examine and analyze image regions within its Chain-of-Thought, enabling multi-turn, self-reflective reasoning for complex visual tasks.
  • Reinforced OCR, Document, and Chart Understanding: Excels at accurately extracting structured information from visual data and performing reasoning over extracted content, making it highly effective for information-dense visual tasks.

Good For

  • Applications requiring advanced multimodal understanding with efficient inference.
  • Tasks involving long documents, high-resolution images, and complex visual reasoning.
  • Use cases demanding robust OCR, document understanding, and chart/diagram analysis.