deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

Warm
Public
7.6B
FP8
131072
Jan 20, 2025
License: mit
Hugging Face
Overview

DeepSeek-R1-Distill-Qwen-7B: Reasoning Capabilities in a Compact Model

DeepSeek-R1-Distill-Qwen-7B is a 7.6 billion parameter model from DeepSeek AI, part of the DeepSeek-R1 series. It is a distilled version of the larger DeepSeek-R1, fine-tuned on reasoning patterns generated by its predecessor, and built upon the Qwen2.5-Math-7B base model. This approach demonstrates that complex reasoning capabilities can be effectively transferred to smaller, dense models.

Key Capabilities

  • Enhanced Reasoning: Benefits from distillation of advanced reasoning patterns, showing strong performance in math, code, and general reasoning benchmarks.
  • Long Context Understanding: Supports a substantial context length of 131,072 tokens, enabling processing of extensive inputs.
  • Performance: Achieves competitive results across various benchmarks, including AIME 2024 (55.5% pass@1), MATH-500 (92.8% pass@1), and LiveCodeBench (37.6% pass@1).

Good For

  • Complex Problem Solving: Ideal for tasks requiring step-by-step reasoning, such as mathematical proofs, code generation, and logical deduction.
  • Research and Development: Provides a powerful, open-source foundation for further research into model distillation and reasoning capabilities.
  • Applications with Long Contexts: Suitable for use cases where processing and understanding very long documents or conversations are critical.