Name: XiaomiMiMo/MiMo-7B-RL-Zero API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: XiaomiMiMo

MiMo-7B-RL-Zero: A Reasoning-Focused 7B Language Model

MiMo-7B-RL-Zero is a 7 billion parameter model from the XiaomiMiMo series, meticulously developed by the Xiaomi LLM-Core Team to excel in reasoning tasks. Unlike many models that rely on larger base architectures for reasoning, MiMo-7B-RL-Zero demonstrates exceptional capabilities in mathematics and code, even matching the performance of larger 32B models and OpenAI o1-mini in certain benchmarks.

Key Capabilities & Innovations

Reasoning-Centric Pre-Training: The base MiMo-7B model was pre-trained from scratch with an optimized data pipeline, focusing on increasing reasoning pattern density and generating massive synthetic reasoning data. It utilized a three-stage data mixture strategy over approximately 25 trillion tokens.
Reinforcement Learning (RL) Optimization: MiMo-7B-RL-Zero is the result of RL training applied directly to the MiMo-7B base model. This process involved curating 130K mathematics and code problems, using rule-based accuracy rewards, and introducing a test difficulty-driven code reward to address sparse reward issues.
Multiple-Token Prediction (MTP): The model incorporates MTP as an additional training objective, enhancing performance and accelerating inference. With one MTP layer, it achieves an acceptance rate of about 90% for speculative decoding.
Efficient RL Infrastructure: XiaomiMiMo developed a Seamless Rollout Engine for faster RL training and validation, integrating continuous rollout, asynchronous reward computation, and early termination.

Performance Highlights

Evaluations show MiMo-7B-RL-Zero significantly improves over its base model in reasoning benchmarks. For instance, it achieves 93.6 Pass@1 on MATH500, 56.4 Pass@1 on AIME 2024, and 49.1 Pass@1 on LiveCodeBench v5, showcasing its strong performance in mathematical and coding reasoning.

Good for

Complex Reasoning Tasks: Ideal for applications requiring strong mathematical problem-solving and code generation/understanding.
Efficient Inference: Benefits from MTP integration, allowing for faster speculative decoding.
Research and Development: Provides a strong foundation for further research into reasoning-focused LLMs, particularly in the 7B parameter class.

Overview

MiMo-7B-RL-Zero: A Reasoning-Focused 7B Language Model

Key Capabilities & Innovations

Performance Highlights

Good for

Full Model Card (README)