Name: MMR1/MMR1-7B-RL API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: MMR1

Overview

MMR1/MMR1-7B-RL is a 7 billion parameter multimodal reasoning model developed by Sicong Leng et al., focusing on enhancing complex reasoning, especially in mathematics. It addresses key limitations in large multimodal reasoning models, such as the scarcity of open, high-quality long chain-of-thought (CoT) data and the instability of reinforcement learning (RL) algorithms during post-training.

Key Contributions & Methodology

Variance-Aware Sampling (VAS): A novel data selection strategy that uses a Variance Promotion Score (VPS) to combine outcome variance and trajectory diversity. This stabilizes policy optimization and improves convergence in RL fine-tuning, mitigating gradient vanishing problems.
Large-scale Curated Resources: The project provides ~1.6M long CoT cold-start data (MMR1-SFT Dataset) and ~15k RL QA pairs (MMR1-RL Dataset), ensuring quality, difficulty, and diversity across domains like mathematics, science, charts, and document tables.
Open-source Models: MMR1 offers a family of open-source multimodal reasoning models (3B, 7B, 32B) and a fully reproducible training codebase, establishing standardized baselines.

Performance

MMR1-7B-RL achieves an average score of 58.4 on a suite of mathematics-related multimodal reasoning benchmarks (MathVerse, MathVista, MathVision, LogicVista, ChartQA), setting a new state-of-the-art among 7B-scale reasoning models. This demonstrates the effectiveness of VAS and the curated CoT training data.

Use Cases

This model is particularly well-suited for applications requiring advanced multimodal reasoning, especially in scientific and mathematical domains where complex, multi-step problem-solving and long chain-of-thought generation are critical.

Overview

Overview

Key Contributions & Methodology

Performance

Use Cases

Full Model Card (README)