DeepSeek-R1-Distill-Qwen-14B Overview

This model is a 14.8 billion parameter language model from DeepSeek-AI, part of their DeepSeek-R1-Distill series. It is a distilled version of the larger DeepSeek-R1 model, built upon the Qwen2.5 architecture, and specifically fine-tuned to inherit and apply advanced reasoning patterns. The core innovation lies in demonstrating that complex reasoning capabilities, initially developed in larger models through extensive reinforcement learning (RL), can be effectively transferred to smaller, dense models.

Key Capabilities

Enhanced Reasoning: Benefits from reasoning data generated by DeepSeek-R1, which itself was developed using large-scale RL to discover sophisticated chain-of-thought (CoT) patterns.
Strong Performance in Math & Code: Evaluation results indicate strong performance across mathematical benchmarks like AIME 2024 and MATH-500, and coding tasks such as LiveCodeBench and CodeForces.
Efficient Distillation: Represents a successful approach to distilling the reasoning prowess of much larger models into a more compact form, making advanced AI more accessible.
Qwen2.5 Base: Leverages the robust foundation of the Qwen2.5 series, ensuring a solid base for its fine-tuned capabilities.

Usage Recommendations

Optimal Settings: Recommended temperature range of 0.5-0.7 (0.6 ideal) to prevent repetitive or incoherent outputs.
Prompting: Avoid system prompts; include all instructions within the user prompt. For mathematical problems, include a directive like "Please reason step by step, and put your final answer within \boxed{}".
Enforced Reasoning: To ensure thorough reasoning, it is recommended to enforce the model to start its response with "\n".

Overview

DeepSeek-R1-Distill-Qwen-14B Overview

Key Capabilities

Usage Recommendations

Full Model Card (README)