deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

Warm
Public
14.8B
FP8
131072
License: mit
Hugging Face
Overview

Model Overview

DeepSeek-R1-Distill-Qwen-14B is a 14.8 billion parameter language model from DeepSeek-AI, part of their DeepSeek-R1 series. This model is a distillation of the larger DeepSeek-R1, which was developed using large-scale reinforcement learning (RL) to enhance reasoning capabilities without initial supervised fine-tuning (SFT). The distillation process transfers the advanced reasoning patterns of DeepSeek-R1 into this smaller, dense model, built upon the Qwen2.5 architecture.

Key Capabilities

  • Enhanced Reasoning: Benefits from reasoning patterns distilled from DeepSeek-R1, which demonstrated capabilities like self-verification and reflection.
  • Strong Performance: Achieves competitive results across various benchmarks, particularly in math (AIME 2024 pass@1: 69.7, MATH-500 pass@1: 93.9) and coding (LiveCodeBench pass@1: 53.1, CodeForces rating: 1481).
  • Long Context: Supports a context length of 131,072 tokens, enabling processing of extensive inputs.
  • Distilled Efficiency: Offers powerful reasoning in a more compact form factor compared to its larger parent model.

Usage Recommendations

  • Prompting: Avoid system prompts; all instructions should be within the user prompt. For mathematical problems, include a directive like "Please reason step by step, and put your final answer within \boxed{}".
  • Temperature: Recommended temperature range is 0.5-0.7 (0.6 is ideal) to prevent repetitive or incoherent outputs.
  • Enforced Reasoning: To ensure thorough reasoning, it's recommended to enforce the model to start its response with "\n".