Overview
OpenR1-Distill-7B: Reasoning Capabilities from DeepSeek-R1
OpenR1-Distill-7B is a 7.6 billion parameter language model developed by open-r1, post-trained from a modified version of Qwen/Qwen2.5-Math-7B. Its core innovation lies in its training on the Mixture-of-Thoughts dataset, a curated collection of 350,000 verified reasoning traces distilled from the larger DeepSeek-R1 model.
Key Capabilities & Features
- Enhanced Reasoning: Specifically trained to perform step-by-step reasoning across diverse domains including mathematics, coding, and science.
- DeepSeek-R1 Replication: Aims to reproduce the reasoning performance of DeepSeek-R1 in an open and reproducible 7B parameter model.
- Extended Context: Built upon a Qwen2.5-Math-7B variant with its RoPE base frequency extended to 300k, enabling training on a context of 32k tokens.
- Performance Benchmarks: Achieves competitive scores on reasoning benchmarks such as AIME 2024 (52.7), MATH-500 (89.0), GPQA Diamond (52.8), and LiveCodeBench v5 (39.4), closely matching or exceeding DeepSeek-R1-Distill-Qwen-7B.
Ideal Use Cases
- Research on Inference-Time Compute: Provides a strong baseline for exploring efficient reasoning at inference.
- Reinforcement Learning with Verifiable Rewards (RLVR): Suitable for developing and testing RL systems that require verifiable reasoning steps.
- Mathematical and Scientific Problem Solving: Excels in tasks requiring logical deduction and multi-step solutions.
- Code Generation and Analysis: Demonstrates proficiency in coding-related reasoning challenges.