tomascooler/affine-wh0-5FzxcV9qRtCuZRic8PyD3Zv7JSzbzqDeRa3yB5d94bahmPuZ
Ovis2.6-30B-A3B by AIDC-AI is a Multimodal Large Language Model (MLLM) featuring a Mixture-of-Experts (MoE) architecture with 30 billion total parameters and approximately 3 billion active parameters during inference. It extends context to 64K tokens and supports image resolutions up to 2880x2880, enhancing long-document Q&A and high-resolution visual processing. This model introduces "Think with Image" for active visual reasoning and significantly reinforces OCR, document understanding, and chart analysis capabilities.
Loading preview...
Ovis2.6-30B-A3B: Advanced Multimodal MoE Model
Ovis2.6-30B-A3B is the latest iteration in the Ovis series of Multimodal Large Language Models (MLLMs) from AIDC-AI. It significantly upgrades its LLM backbone to a Mixture-of-Experts (MoE) architecture, allowing it to scale to 30 billion total parameters while maintaining low serving costs with only ~3 billion active parameters during inference.
Key Capabilities
- Enhanced Long-Sequence & High-Resolution Processing: Features an extended context window of 64K tokens and supports image resolutions up to 2880x2880. This is particularly beneficial for processing information-dense visual inputs and long-document question answering.
- "Think with Image": Introduces an innovative capability where the model can actively invoke visual tools (e.g., cropping, rotation) to re-examine and analyze image regions within its Chain-of-Thought, enabling multi-turn, self-reflective reasoning for complex visual tasks.
- Reinforced OCR, Document, and Chart Understanding: Excels at accurately extracting structured information from visual data and performing reasoning over extracted content, making it highly effective for information-dense visual tasks.
Good For
- Applications requiring advanced multimodal understanding with efficient inference.
- Tasks involving long documents, high-resolution images, and complex visual reasoning.
- Use cases demanding robust OCR, document understanding, and chart/diagram analysis.