Name: XiaomiMiMo/MiMo-V2.5 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: XiaomiMiMo

MiMo-V2.5: Omnimodal Agentic Model

MiMo-V2.5, developed by XiaomiMiMo, is a powerful omnimodal model built on a sparse Mixture of Experts (MoE) architecture, featuring 310 billion total parameters with 15 billion activated. It supports an extensive context length of up to 1 million tokens, enabling deep understanding and reasoning across various data types.

Key Capabilities

Native Omnimodal Understanding: Processes and integrates text, image, video, and audio inputs within a unified architecture.
Hybrid Attention Architecture: Utilizes a hybrid design of Sliding Window Attention (SWA) and Global Attention (GA) to optimize KV-cache storage while maintaining long-context performance.
Dedicated Encoders: Incorporates a 729M-parameter Vision Transformer (ViT) and a 261M-parameter audio encoder for high-quality multimodal perception.
Agentic Workflows: Enhanced with post-training techniques including SFT, large-scale agentic RL, and Multi-Teacher On-Policy Distillation (MOPD) for strong agentic capabilities.
Efficient Inference: Features Multi-Token Prediction (MTP) modules to accelerate inference through speculative decoding.

Good For

Applications requiring multimodal perception across text, image, video, and audio.
Tasks demanding long-context reasoning and understanding.
Developing agentic systems that can interact and perform complex workflows.
Scenarios where efficient processing of large multimodal inputs is crucial.

Overview

MiMo-V2.5: Omnimodal Agentic Model

Key Capabilities

Good For

Full Model Card (README)