DeepSeek-R1-Distill-Qwen-14B: Reasoning Capabilities in a Compact Model

This model is part of the DeepSeek-R1-Distill series, developed by DeepSeek-AI, which focuses on transferring advanced reasoning capabilities from larger models into more efficient, smaller architectures. The DeepSeek-R1-Distill-Qwen-14B is a 14.8 billion parameter model built upon the Qwen2.5 base, fine-tuned using high-quality reasoning data generated by the powerful DeepSeek-R1 model.

Key Capabilities & Features

Reasoning Distillation: Leverages reasoning patterns from the 671B-parameter DeepSeek-R1, which was developed using large-scale reinforcement learning (RL) to discover complex chain-of-thought (CoT) reasoning without initial supervised fine-tuning (SFT).
Enhanced Performance: Achieves strong results on various benchmarks, including AIME 2024 (69.7 pass@1), MATH-500 (93.9 pass@1), GPQA Diamond (59.1 pass@1), and LiveCodeBench (53.1 pass@1), often outperforming larger general-purpose models in specific reasoning domains.
Optimized for Reasoning: Designed to excel in tasks requiring logical deduction, problem-solving, and code generation, benefiting from the sophisticated reasoning data used in its fine-tuning.
Context Length: Supports a substantial context window of 32768 tokens, enabling processing of longer and more complex inputs.

Usage Recommendations

Prompting: Avoid system prompts; integrate all instructions directly into the user prompt.
Reasoning Enforcement: For optimal reasoning, instruct the model to begin its response with "\n" to encourage thorough thought processes.
Temperature Setting: Recommended temperature range is 0.5-0.7 (0.6 is ideal) to prevent repetitive or incoherent outputs.
Mathematical Tasks: Include "Please reason step by step, and put your final answer within \boxed{}" for math problems.

Good For

Applications requiring strong mathematical and coding reasoning.
Tasks benefiting from detailed, step-by-step problem-solving.
Developers seeking a capable reasoning model in a more accessible parameter size compared to its larger counterparts.

Overview

DeepSeek-R1-Distill-Qwen-14B: Reasoning Capabilities in a Compact Model

Key Capabilities & Features

Usage Recommendations

Good For

Full Model Card (README)