XiaomiMiMo/MiMo-V2-Flash

Warm
Public
310B
FP8
32768
4
Dec 16, 2025
License: mit
Hugging Face

MiMo-V2-Flash by XiaomiMiMo is a 309B total parameter Mixture-of-Experts (MoE) language model with 15B active parameters, designed for high-speed reasoning and agentic workflows. It features a novel hybrid attention architecture and Multi-Token Prediction (MTP) for efficient inference and long-context handling up to 256k tokens. The model excels in complex reasoning tasks and agentic capabilities, including code generation and web development, achieved through advanced post-training techniques like Multi-Teacher On-Policy Distillation (MOPD) and large-scale agentic RL.

Overview

MiMo-V2-Flash: High-Speed Agentic MoE Model

MiMo-V2-Flash, developed by XiaomiMiMo, is a Mixture-of-Experts (MoE) language model featuring 309B total parameters and 15B active parameters. It is engineered for high-speed reasoning and agentic workflows, balancing long-context modeling with inference efficiency.

Key Innovations & Capabilities

  • Hybrid Attention Architecture: Combines Sliding Window Attention (SWA) and Global Attention (GA) with an aggressive 128-token window and learnable attention sink bias, reducing KV-cache storage by nearly 6x while supporting up to 256k context length.
  • Multi-Token Prediction (MTP): A lightweight 0.33B parameter module that triples output speed during inference and accelerates RL training rollouts.
  • Efficient Pre-Training: Trained on 27T tokens using FP8 mixed precision and a native 32k sequence length.
  • Advanced Post-Training: Utilizes Multi-Teacher On-Policy Distillation (MOPD) and large-scale agentic Reinforcement Learning (RL) on massive code agent environments (100,000+ tasks) and multimodal verifiers for web development.

Performance Highlights

The model demonstrates strong performance across various benchmarks, often surpassing models with larger active parameter counts. Notably, it achieves superior results in:

  • Reasoning: High scores on MMLU-Pro, GPQA-Diamond, and AIME 2025.
  • Code Agent: Achieves 73.4% on SWE-Bench Verified and strong results on Terminal-Bench, indicating robust capabilities for automated code tasks.
  • Long Context: Maintains high accuracy up to 256k context length, with 96.7% on NIAH-Multi at 256K.

Recommended Use Cases

  • Agentic Workflows: Ideal for tasks requiring complex reasoning, tool use, and automated problem-solving, particularly in code generation and web development.
  • High-Throughput Applications: Its MTP module and efficient architecture make it suitable for scenarios demanding fast inference and high output speeds.
  • Long-Context Understanding: Excellent for processing and generating content over very long documents or conversations.