Bharat2004/DeepSeek-R1-Distill-Qwen-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 20, 2026License:mitArchitecture:Transformer Open Weights Cold

DeepSeek-R1-Distill-Qwen-7B is a 7.6 billion parameter language model developed by DeepSeek-AI, distilled from the larger DeepSeek-R1 model and based on the Qwen2.5-Math-7B architecture. It is specifically fine-tuned using reasoning patterns generated by DeepSeek-R1, excelling in mathematical, coding, and general reasoning tasks. This model offers strong performance in complex problem-solving, making it suitable for applications requiring robust logical inference.

Loading preview...

DeepSeek-R1-Distill-Qwen-7B Overview

DeepSeek-R1-Distill-Qwen-7B is a 7.6 billion parameter model developed by DeepSeek-AI, part of their DeepSeek-R1 series. This model is a distilled version of the larger DeepSeek-R1, fine-tuned from the Qwen2.5-Math-7B base model using reasoning data generated by DeepSeek-R1. The core innovation lies in demonstrating that complex reasoning patterns from larger models can be effectively transferred to smaller, dense models.

Key Capabilities & Features

  • Enhanced Reasoning: Benefits from distillation of DeepSeek-R1's advanced reasoning patterns, which were developed through large-scale reinforcement learning (RL).
  • Strong Performance: Achieves competitive results across various benchmarks, particularly in math (AIME 2024 pass@1: 55.5, MATH-500 pass@1: 92.8) and code (LiveCodeBench pass@1: 37.6, CodeForces rating: 1189).
  • Efficient Size: At 7.6 billion parameters, it offers a powerful reasoning engine in a more compact form factor compared to its larger counterparts.
  • Qwen2.5 Base: Built upon the Qwen2.5 architecture, leveraging its established capabilities.

When to Use This Model

  • Reasoning-Intensive Tasks: Ideal for applications requiring strong logical inference, problem-solving, and chain-of-thought capabilities.
  • Mathematical and Coding Challenges: Excels in benchmarks related to mathematics and code generation/understanding.
  • Resource-Constrained Environments: Provides high reasoning performance in a smaller model size, suitable for deployment where larger models might be impractical.
  • Research and Development: Useful for exploring the efficacy of distillation techniques for reasoning capabilities in LLMs.