open-r1/OpenR1-Distill-7B

Warm
Public
7.6B
FP8
131072
License: apache-2.0
Hugging Face
Overview

OpenR1-Distill-7B: Reasoning Capabilities from DeepSeek-R1

OpenR1-Distill-7B is a 7.6 billion parameter language model developed by open-r1, post-trained from a modified version of Qwen/Qwen2.5-Math-7B. Its core innovation lies in its training on the Mixture-of-Thoughts dataset, a curated collection of 350,000 verified reasoning traces distilled from the larger DeepSeek-R1 model.

Key Capabilities & Features

  • Enhanced Reasoning: Specifically trained to perform step-by-step reasoning across diverse domains including mathematics, coding, and science.
  • DeepSeek-R1 Replication: Aims to reproduce the reasoning performance of DeepSeek-R1 in an open and reproducible 7B parameter model.
  • Extended Context: Built upon a Qwen2.5-Math-7B variant with its RoPE base frequency extended to 300k, enabling training on a context of 32k tokens.
  • Performance Benchmarks: Achieves competitive scores on reasoning benchmarks such as AIME 2024 (52.7), MATH-500 (89.0), GPQA Diamond (52.8), and LiveCodeBench v5 (39.4), closely matching or exceeding DeepSeek-R1-Distill-Qwen-7B.

Ideal Use Cases

  • Research on Inference-Time Compute: Provides a strong baseline for exploring efficient reasoning at inference.
  • Reinforcement Learning with Verifiable Rewards (RLVR): Suitable for developing and testing RL systems that require verifiable reasoning steps.
  • Mathematical and Scientific Problem Solving: Excels in tasks requiring logical deduction and multi-step solutions.
  • Code Generation and Analysis: Demonstrates proficiency in coding-related reasoning challenges.