jpiotrowski/DeepSeek-R1-Distill-Qwen-14B

TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Apr 15, 2026License:mitArchitecture:Transformer Open Weights Cold

The jpiotrowski/DeepSeek-R1-Distill-Qwen-14B is a 14.8 billion parameter language model distilled from DeepSeek-R1, developed by DeepSeek-AI, and based on the Qwen2.5 architecture. This model is specifically fine-tuned using reasoning data generated by the larger DeepSeek-R1 model, which itself was developed through large-scale reinforcement learning without initial supervised fine-tuning. It excels in reasoning tasks across math, code, and general English and Chinese benchmarks, demonstrating that complex reasoning patterns can be effectively transferred to smaller, dense models. With a 32768 token context length, it offers strong performance for applications requiring robust analytical capabilities.

Loading preview...

DeepSeek-R1-Distill-Qwen-14B: Reasoning Capabilities in a Compact Model

This model is part of the DeepSeek-R1-Distill series, developed by DeepSeek-AI, which focuses on transferring advanced reasoning capabilities from larger models into more efficient, smaller architectures. The DeepSeek-R1-Distill-Qwen-14B is a 14.8 billion parameter model built upon the Qwen2.5 base, fine-tuned using high-quality reasoning data generated by the powerful DeepSeek-R1 model.

Key Capabilities & Features

  • Reasoning Distillation: Leverages reasoning patterns from the 671B-parameter DeepSeek-R1, which was developed using large-scale reinforcement learning (RL) to discover complex chain-of-thought (CoT) reasoning without initial supervised fine-tuning (SFT).
  • Enhanced Performance: Achieves strong results on various benchmarks, including AIME 2024 (69.7 pass@1), MATH-500 (93.9 pass@1), GPQA Diamond (59.1 pass@1), and LiveCodeBench (53.1 pass@1), often outperforming larger general-purpose models in specific reasoning domains.
  • Optimized for Reasoning: Designed to excel in tasks requiring logical deduction, problem-solving, and code generation, benefiting from the sophisticated reasoning data used in its fine-tuning.
  • Context Length: Supports a substantial context window of 32768 tokens, enabling processing of longer and more complex inputs.

Usage Recommendations

  • Prompting: Avoid system prompts; integrate all instructions directly into the user prompt.
  • Reasoning Enforcement: For optimal reasoning, instruct the model to begin its response with "\n" to encourage thorough thought processes.
  • Temperature Setting: Recommended temperature range is 0.5-0.7 (0.6 is ideal) to prevent repetitive or incoherent outputs.
  • Mathematical Tasks: Include "Please reason step by step, and put your final answer within \boxed{}" for math problems.

Good For

  • Applications requiring strong mathematical and coding reasoning.
  • Tasks benefiting from detailed, step-by-step problem-solving.
  • Developers seeking a capable reasoning model in a more accessible parameter size compared to its larger counterparts.