MiMo-7B-Base: Unlocking Reasoning Potential

MiMo-7B-Base is a 7 billion parameter foundational language model from Xiaomi, meticulously designed to excel in reasoning tasks. Unlike many models that primarily focus on post-training for reasoning, MiMo-7B-Base integrates reasoning-centric strategies directly into its pre-training phase. This approach aims to provide a base model with extraordinary inherent reasoning potential, even surpassing larger models in certain benchmarks.

Key Capabilities & Innovations

Reasoning-Optimized Pre-Training: The model's pre-training data pipeline is optimized for reasoning pattern density, utilizing enhanced text extraction toolkits and multi-dimensional data filtering. It also incorporates massive, diverse synthetic reasoning data.
Massive Pre-Training Scale: MiMo-7B-Base was pre-trained on an estimated 25 trillion tokens using a three-stage data mixture strategy.
Multiple-Token Prediction (MTP): This additional training objective enhances model performance and significantly accelerates inference, with an acceptance rate of about 90% for speculative decoding using one MTP layer.
Strong Foundation for Post-Training: The base model demonstrates a robust reasoning potential, serving as an effective starting point for further fine-tuning (SFT) and Reinforcement Learning (RL) to achieve superior performance in mathematics and code reasoning.

Good For

Developing Reasoning-Focused LLMs: Ideal as a base model for researchers and developers looking to build or fine-tune models specifically for complex reasoning challenges.
Mathematics and Code Tasks: Provides a strong foundation for models intended for mathematical problem-solving and code generation/understanding, as evidenced by its performance in the MiMo-7B series.
Efficient Inference: The integration of MTP makes it suitable for applications requiring accelerated inference, particularly when deployed with compatible engines like SGLang or a custom vLLM fork.

Overview

MiMo-7B-Base: Unlocking Reasoning Potential

Key Capabilities & Innovations

Good For

Full Model Card (README)