Nitish-Garikoti/DeepSeek-R1-Distill-Qwen-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 29, 2026License:mitArchitecture:Transformer Open Weights Cold

DeepSeek-R1-Distill-Qwen-7B is a 7.6 billion parameter language model developed by DeepSeek AI, distilled from the larger DeepSeek-R1 model and based on the Qwen2.5 architecture. It is specifically fine-tuned using reasoning data generated by DeepSeek-R1, aiming to transfer advanced reasoning patterns to a smaller, more efficient model. This model excels in complex reasoning tasks across math, code, and general English and Chinese benchmarks, offering strong performance for applications requiring robust analytical capabilities.

Loading preview...

DeepSeek-R1-Distill-Qwen-7B: Reasoning Capabilities in a Compact Model

DeepSeek-R1-Distill-Qwen-7B is a 7.6 billion parameter model from DeepSeek AI, part of their DeepSeek-R1 series. This model is a distilled version of the larger DeepSeek-R1, which itself was developed using a novel large-scale reinforcement learning (RL) approach without initial supervised fine-tuning (SFT) to foster advanced reasoning. The distillation process involves fine-tuning smaller base models, like Qwen2.5-Math-7B in this case, with reasoning data generated by the powerful DeepSeek-R1.

Key Capabilities

  • Advanced Reasoning: Inherits and demonstrates strong reasoning patterns across various domains, including mathematics, coding, and general problem-solving.
  • Efficient Performance: Achieves competitive performance in a smaller parameter count (7.6B), making it more accessible and efficient than larger models with similar reasoning abilities.
  • Benchmark Excellence: Shows strong results on benchmarks such as AIME 2024 (55.5 pass@1), MATH-500 (92.8 pass@1), and LiveCodeBench (37.6 pass@1), indicating robust analytical and problem-solving skills.
  • Distillation Innovation: Validates the concept that complex reasoning capabilities from larger, RL-trained models can be effectively transferred to smaller, dense models.

Good for

  • Complex Problem Solving: Ideal for applications requiring detailed step-by-step reasoning, such as mathematical proofs, code generation, and logical puzzles.
  • Resource-Constrained Environments: Suitable for deployment where computational resources are a consideration, offering high performance without the overhead of much larger models.
  • Research and Development: Provides a strong foundation for further research into model distillation, reasoning transfer, and efficient AI deployment.