Name: XiaomiMiMo/MiMo-7B-RL-0530 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: XiaomiMiMo

MiMo-7B-RL-0530: A Reasoning-Focused 7B LLM

MiMo-7B-RL-0530 is a 7 billion parameter language model from XiaomiMiMo, designed to unlock and enhance reasoning potential, particularly in mathematics and code. This model is an improved version of MiMo-7B-RL, with scaled SFT data and expanded RL training window size, leading to continuous performance improvements.

Key Innovations & Capabilities

Reasoning-Centric Pre-Training: The base MiMo-7B model was pre-trained from scratch with an optimized data preprocessing pipeline, multi-dimensional data filtering to increase reasoning pattern density, and massive synthetic reasoning data generation. It was trained on approximately 25 trillion tokens.
Multiple-Token Prediction (MTP): Incorporates MTP as an additional training objective to enhance performance and accelerate inference, with an acceptance rate of about 90% for speculative decoding.
Advanced RL Post-Training: Utilizes a curated dataset of 130K mathematics and code problems for RL training, employing rule-based accuracy rewards. It introduces a test difficulty-driven code reward to mitigate sparse reward issues for challenging code problems and a data re-sampling strategy for efficient policy updates.
Strong Performance: Demonstrates superior performance on both mathematics and code reasoning tasks. The MiMo-7B-RL-0530 variant shows significant improvements over MiMo-7B-RL, achieving 97.2% on MATH500 and 80.1% on AIME 2024, and 60.9% on LiveCodeBench v5.
Efficient RL Infrastructure: Features a Seamless Rollout Engine for accelerated RL training and validation, achieving 2.29x faster training and 1.96x faster validation by minimizing GPU idle time.

Recommended Use Cases

Mathematical Problem Solving: Excels in complex mathematical reasoning, as evidenced by high scores on MATH500 and AIME benchmarks.
Code Generation and Debugging: Strong performance on LiveCodeBench indicates its suitability for code-related tasks.
General Reasoning Tasks: While specialized, its robust pre-training and RL fine-tuning contribute to strong general reasoning capabilities, as seen in GPQA-Diamond scores.

For more technical details, refer to the Technical Report.

Overview

MiMo-7B-RL-0530: A Reasoning-Focused 7B LLM

Key Innovations & Capabilities

Recommended Use Cases

Full Model Card (README)