Name: XiaomiMiMo/MiMo-V2-Flash API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: XiaomiMiMo

MiMo-V2-Flash: High-Speed Agentic MoE Model

MiMo-V2-Flash, developed by XiaomiMiMo, is a Mixture-of-Experts (MoE) language model featuring 309B total parameters and 15B active parameters. It is engineered for high-speed reasoning and agentic workflows, balancing long-context modeling with inference efficiency.

Key Innovations & Capabilities

Hybrid Attention Architecture: Combines Sliding Window Attention (SWA) and Global Attention (GA) with an aggressive 128-token window and learnable attention sink bias, reducing KV-cache storage by nearly 6x while supporting up to 256k context length.
Multi-Token Prediction (MTP): A lightweight 0.33B parameter module that triples output speed during inference and accelerates RL training rollouts.
Efficient Pre-Training: Trained on 27T tokens using FP8 mixed precision and a native 32k sequence length.
Advanced Post-Training: Utilizes Multi-Teacher On-Policy Distillation (MOPD) and large-scale agentic Reinforcement Learning (RL) on massive code agent environments (100,000+ tasks) and multimodal verifiers for web development.

Performance Highlights

The model demonstrates strong performance across various benchmarks, often surpassing models with larger active parameter counts. Notably, it achieves superior results in:

Reasoning: High scores on MMLU-Pro, GPQA-Diamond, and AIME 2025.
Code Agent: Achieves 73.4% on SWE-Bench Verified and strong results on Terminal-Bench, indicating robust capabilities for automated code tasks.
Long Context: Maintains high accuracy up to 256k context length, with 96.7% on NIAH-Multi at 256K.

Recommended Use Cases

Agentic Workflows: Ideal for tasks requiring complex reasoning, tool use, and automated problem-solving, particularly in code generation and web development.
High-Throughput Applications: Its MTP module and efficient architecture make it suitable for scenarios demanding fast inference and high output speeds.
Long-Context Understanding: Excellent for processing and generating content over very long documents or conversations.