Overview
Model Overview
DeepSeek-R1-Distill-Qwen-14B is a 14.8 billion parameter language model from DeepSeek-AI, part of their DeepSeek-R1 series. This model is a distillation of the larger DeepSeek-R1, which was developed using large-scale reinforcement learning (RL) to enhance reasoning capabilities without initial supervised fine-tuning (SFT). The distillation process transfers the advanced reasoning patterns of DeepSeek-R1 into this smaller, dense model, built upon the Qwen2.5 architecture.
Key Capabilities
- Enhanced Reasoning: Benefits from reasoning patterns distilled from DeepSeek-R1, which demonstrated capabilities like self-verification and reflection.
- Strong Performance: Achieves competitive results across various benchmarks, particularly in math (AIME 2024 pass@1: 69.7, MATH-500 pass@1: 93.9) and coding (LiveCodeBench pass@1: 53.1, CodeForces rating: 1481).
- Long Context: Supports a context length of 131,072 tokens, enabling processing of extensive inputs.
- Distilled Efficiency: Offers powerful reasoning in a more compact form factor compared to its larger parent model.
Usage Recommendations
- Prompting: Avoid system prompts; all instructions should be within the user prompt. For mathematical problems, include a directive like "Please reason step by step, and put your final answer within \boxed{}".
- Temperature: Recommended temperature range is 0.5-0.7 (0.6 is ideal) to prevent repetitive or incoherent outputs.
- Enforced Reasoning: To ensure thorough reasoning, it's recommended to enforce the model to start its response with "\n".