unsloth/DeepSeek-R1-Distill-Qwen-14B

Warm
Public
14B
FP8
32768
License: apache-2.0
Hugging Face
Overview

DeepSeek-R1-Distill-Qwen-14B Overview

This model is a 14 billion parameter distilled version of DeepSeek AI's DeepSeek-R1, built upon the Qwen2.5 architecture. DeepSeek-R1 itself is a 671 billion total parameter (37 billion activated) Mixture-of-Experts (MoE) model, developed through large-scale reinforcement learning (RL) to excel in reasoning tasks without initial supervised fine-tuning (SFT).

Key Capabilities

  • Reasoning Distillation: Leverages reasoning patterns from the larger DeepSeek-R1 model, demonstrating that complex reasoning can be effectively transferred to smaller models.
  • Enhanced Performance: Achieves strong results across various benchmarks, particularly in math (AIME 2024 pass@1: 69.7, MATH-500 pass@1: 93.9) and coding (LiveCodeBench pass@1: 53.1, CodeForces rating: 1481), often outperforming models like GPT-4o-0513 and Claude-3.5-Sonnet-1022 in specific reasoning metrics.
  • Qwen2.5 Base: Built on the Qwen2.5 series, inheriting its foundational language understanding and generation capabilities.
  • Extended Context: Supports a context length of 32768 tokens, suitable for processing longer inputs and complex problem descriptions.

Good For

  • Reasoning-Intensive Applications: Ideal for tasks requiring strong logical deduction, problem-solving, and chain-of-thought generation.
  • Math and Code Generation: Excels in mathematical problem-solving and code-related benchmarks, making it suitable for technical domains.
  • Resource-Efficient Deployment: As a distilled model, it offers a more efficient alternative to larger models while retaining significant reasoning prowess, making it suitable for environments with computational constraints.
  • Research and Development: Provides a valuable open-source resource for further research into model distillation and reasoning capabilities.