XiaomiMiMo/MiMo-7B-Base
MiMo-7B-Base is a 7 billion parameter base language model developed by Xiaomi, specifically engineered to unlock and enhance reasoning capabilities. Pre-trained on approximately 25 trillion tokens with a focus on reasoning pattern density and synthetic data, it incorporates Multiple-Token Prediction (MTP) for improved performance and accelerated inference. This model is designed to serve as a strong foundation for advanced reasoning tasks, particularly in mathematics and code, with a context length of 32768 tokens.
Loading preview...
MiMo-7B-Base: Unlocking Reasoning Potential
MiMo-7B-Base is a 7 billion parameter foundational language model from Xiaomi, meticulously designed to excel in reasoning tasks. Unlike many models that primarily focus on post-training for reasoning, MiMo-7B-Base integrates reasoning-centric strategies directly into its pre-training phase. This approach aims to provide a base model with extraordinary inherent reasoning potential, even surpassing larger models in certain benchmarks.
Key Capabilities & Innovations
- Reasoning-Optimized Pre-Training: The model's pre-training data pipeline is optimized for reasoning pattern density, utilizing enhanced text extraction toolkits and multi-dimensional data filtering. It also incorporates massive, diverse synthetic reasoning data.
- Massive Pre-Training Scale: MiMo-7B-Base was pre-trained on an estimated 25 trillion tokens using a three-stage data mixture strategy.
- Multiple-Token Prediction (MTP): This additional training objective enhances model performance and significantly accelerates inference, with an acceptance rate of about 90% for speculative decoding using one MTP layer.
- Strong Foundation for Post-Training: The base model demonstrates a robust reasoning potential, serving as an effective starting point for further fine-tuning (SFT) and Reinforcement Learning (RL) to achieve superior performance in mathematics and code reasoning.
Good For
- Developing Reasoning-Focused LLMs: Ideal as a base model for researchers and developers looking to build or fine-tune models specifically for complex reasoning challenges.
- Mathematics and Code Tasks: Provides a strong foundation for models intended for mathematical problem-solving and code generation/understanding, as evidenced by its performance in the MiMo-7B series.
- Efficient Inference: The integration of MTP makes it suitable for applications requiring accelerated inference, particularly when deployed with compatible engines like SGLang or a custom vLLM fork.