MiMo-V2-Flash by XiaomiMiMo is a 309B total parameter Mixture-of-Experts (MoE) language model with 15B active parameters, designed for high-speed reasoning and agentic workflows. It features a novel hybrid attention architecture and Multi-Token Prediction (MTP) for efficient inference and long-context handling up to 256k tokens. The model excels in complex reasoning tasks and agentic capabilities, including code generation and web development, achieved through advanced post-training techniques like Multi-Teacher On-Policy Distillation (MOPD) and large-scale agentic RL.
No reviews yet. Be the first to review!