XiaomiMiMo/MiMo-7B-SFT
MiMo-7B-SFT is a 7 billion parameter instruction-tuned causal language model developed by XiaomiMiMo, specifically designed to unlock and enhance reasoning capabilities. This model is part of the MiMo-7B series, which focuses on optimizing both pre-training and post-training strategies for mathematical and code reasoning tasks. With a 32K context length, MiMo-7B-SFT serves as a strong foundation for further RL training, demonstrating superior performance in complex problem-solving.
Loading preview...
MiMo-7B-SFT: A Reasoning-Focused Language Model
MiMo-7B-SFT is a 7 billion parameter instruction-tuned model from the XiaomiMiMo series, specifically engineered to excel in reasoning tasks. Unlike many LLMs that rely on large base models for reasoning, MiMo-7B-SFT is trained from scratch with a focus on unlocking inherent reasoning potential through optimized pre-training and post-training strategies.
Key Capabilities & Features
- Reasoning-Optimized Pre-Training: The base model, MiMo-7B-Base, was pre-trained on approximately 25 trillion tokens using a three-stage data mixture strategy and an enhanced data preprocessing pipeline to increase reasoning pattern density. It also incorporates Multiple-Token Prediction (MTP) as an additional training objective for improved performance and accelerated inference.
- Instruction Fine-Tuning (SFT): MiMo-7B-SFT is the supervised fine-tuned version of the base model, serving as a robust starting point for further reinforcement learning (RL) to achieve superior performance in mathematics and code reasoning.
- Strong Performance: While MiMo-7B-SFT is an intermediate model in the MiMo series, it demonstrates strong capabilities in mathematics and code benchmarks, outperforming the base model and setting the stage for the advanced MiMo-7B-RL variants.
- Efficient Inference: The model supports MTP, which, when used with speculative decoding, can achieve high acceptance rates, leading to faster inference. It is supported in SGLang and a custom vLLM fork.
Good For
- Developing Reasoning-Centric Applications: Ideal for tasks requiring strong mathematical and code problem-solving abilities.
- Further Fine-tuning: Serves as an excellent SFT checkpoint for developers looking to apply additional RL or domain-specific fine-tuning.
- Research in Reasoning LLMs: Provides a valuable open-source model for exploring advanced pre-training and post-training techniques for reasoning.