Name: XiaomiMiMo/MiMo-7B-RL API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: XiaomiMiMo

Overview

XiaomiMiMo/MiMo-7B-RL is a 7 billion parameter language model developed by XiaomiMiMo, specifically designed to excel in reasoning tasks across mathematics and code. Unlike many larger models, MiMo-7B-RL demonstrates extraordinary reasoning potential within its smaller footprint, even surpassing some 32B models. It achieves this through a comprehensive approach that optimizes both pre-training and post-training strategies, focusing on enhancing the inherent reasoning capabilities of the base model.

Key Capabilities & Innovations

Reasoning-Focused Pre-Training: The base model, MiMo-7B-Base, was pre-trained with an optimized data pipeline to increase reasoning pattern density, including massive diverse synthetic reasoning data and a three-stage data mixture strategy over approximately 25 trillion tokens. It also incorporates Multiple-Token Prediction (MTP) as an additional training objective for enhanced performance and accelerated inference.
Advanced Post-Training (RL): MiMo-7B-RL is the result of Reinforcement Learning (RL) applied to an SFT model. It utilizes a curated dataset of 130K mathematics and code problems with rule-based accuracy rewards. To address sparse rewards in challenging code problems, it introduces a test difficulty-driven code reward system, assigning fine-grained scores for varying test case difficulties.
Performance: MiMo-7B-RL shows strong performance on benchmarks like MATH500 (95.8 Pass@1), AIME 2024 (68.2 Pass@1), and LiveCodeBench v5 (57.8 Pass@1), often matching or exceeding models like OpenAI o1-mini in its category.
Efficient RL Infrastructure: Features a Seamless Rollout Engine for accelerated RL training and validation, integrating continuous rollout, asynchronous reward computation, and early termination to minimize GPU idle time.

When to Use This Model

Complex Reasoning Tasks: Ideal for applications requiring strong mathematical and logical reasoning, such as solving competitive programming problems or advanced STEM questions.
Code Generation and Problem Solving: Excels in generating and debugging code, particularly for challenging problems where fine-grained feedback is crucial.
Efficiency-Sensitive Applications: Its 7B parameter size combined with MTP support allows for efficient inference, making it suitable for scenarios where performance and speed are critical.

Overview

Overview

Key Capabilities & Innovations

When to Use This Model

Full Model Card (README)