open-r1/OpenR1-Distill-7B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:May 22, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

OpenR1-Distill-7B is a 7.6 billion parameter GPT-like model, post-trained by open-r1 on a variant of Qwen/Qwen2.5-Math-7B with an extended RoPE base frequency for a 32k token context. It is specifically designed to replicate the reasoning capabilities of DeepSeek-R1 by distilling 350k verified reasoning traces across mathematics, coding, and science tasks. This model excels at step-by-step reasoning and is ideal for research in inference-time compute and reinforcement learning with verifiable rewards.

Loading preview...

OpenR1-Distill-7B: Reasoning Capabilities from DeepSeek-R1

OpenR1-Distill-7B is a 7.6 billion parameter language model developed by open-r1, post-trained from a modified version of Qwen/Qwen2.5-Math-7B. Its core innovation lies in its training on the Mixture-of-Thoughts dataset, a curated collection of 350,000 verified reasoning traces distilled from the larger DeepSeek-R1 model.

Key Capabilities & Features

  • Enhanced Reasoning: Specifically trained to perform step-by-step reasoning across diverse domains including mathematics, coding, and science.
  • DeepSeek-R1 Replication: Aims to reproduce the reasoning performance of DeepSeek-R1 in an open and reproducible 7B parameter model.
  • Extended Context: Built upon a Qwen2.5-Math-7B variant with its RoPE base frequency extended to 300k, enabling training on a context of 32k tokens.
  • Performance Benchmarks: Achieves competitive scores on reasoning benchmarks such as AIME 2024 (52.7), MATH-500 (89.0), GPQA Diamond (52.8), and LiveCodeBench v5 (39.4), closely matching or exceeding DeepSeek-R1-Distill-Qwen-7B.

Ideal Use Cases

  • Research on Inference-Time Compute: Provides a strong baseline for exploring efficient reasoning at inference.
  • Reinforcement Learning with Verifiable Rewards (RLVR): Suitable for developing and testing RL systems that require verifiable reasoning steps.
  • Mathematical and Scientific Problem Solving: Excels in tasks requiring logical deduction and multi-step solutions.
  • Code Generation and Analysis: Demonstrates proficiency in coding-related reasoning challenges.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p