MMR1/MMR1-7B-RL

VISIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:32kPublished:Sep 25, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

MMR1/MMR1-7B-RL is a 7 billion parameter multimodal reasoning model developed by Sicong Leng et al. It is specifically designed to enhance multimodal reasoning capabilities, particularly in mathematics-related tasks, by utilizing Variance-Aware Sampling (VAS) for stable reinforcement learning fine-tuning. The model achieves state-of-the-art performance among 7B-scale reasoning models on benchmarks like MathVerse and MathVista, making it suitable for complex analytical applications requiring long chain-of-thought reasoning.

Loading preview...

Overview

MMR1/MMR1-7B-RL is a 7 billion parameter multimodal reasoning model developed by Sicong Leng et al., focusing on enhancing complex reasoning, especially in mathematics. It addresses key limitations in large multimodal reasoning models, such as the scarcity of open, high-quality long chain-of-thought (CoT) data and the instability of reinforcement learning (RL) algorithms during post-training.

Key Contributions & Methodology

  • Variance-Aware Sampling (VAS): A novel data selection strategy that uses a Variance Promotion Score (VPS) to combine outcome variance and trajectory diversity. This stabilizes policy optimization and improves convergence in RL fine-tuning, mitigating gradient vanishing problems.
  • Large-scale Curated Resources: The project provides ~1.6M long CoT cold-start data (MMR1-SFT Dataset) and ~15k RL QA pairs (MMR1-RL Dataset), ensuring quality, difficulty, and diversity across domains like mathematics, science, charts, and document tables.
  • Open-source Models: MMR1 offers a family of open-source multimodal reasoning models (3B, 7B, 32B) and a fully reproducible training codebase, establishing standardized baselines.

Performance

MMR1-7B-RL achieves an average score of 58.4 on a suite of mathematics-related multimodal reasoning benchmarks (MathVerse, MathVista, MathVision, LogicVista, ChartQA), setting a new state-of-the-art among 7B-scale reasoning models. This demonstrates the effectiveness of VAS and the curated CoT training data.

Use Cases

This model is particularly well-suited for applications requiring advanced multimodal reasoning, especially in scientific and mathematical domains where complex, multi-step problem-solving and long chain-of-thought generation are critical.