Overview
DeepSeek-R1-Distill-Qwen-14B Overview
This model is a 14 billion parameter distilled version of DeepSeek AI's DeepSeek-R1, built upon the Qwen2.5 architecture. DeepSeek-R1 itself is a 671 billion total parameter (37 billion activated) Mixture-of-Experts (MoE) model, developed through large-scale reinforcement learning (RL) to excel in reasoning tasks without initial supervised fine-tuning (SFT).
Key Capabilities
- Reasoning Distillation: Leverages reasoning patterns from the larger DeepSeek-R1 model, demonstrating that complex reasoning can be effectively transferred to smaller models.
- Enhanced Performance: Achieves strong results across various benchmarks, particularly in math (AIME 2024 pass@1: 69.7, MATH-500 pass@1: 93.9) and coding (LiveCodeBench pass@1: 53.1, CodeForces rating: 1481), often outperforming models like GPT-4o-0513 and Claude-3.5-Sonnet-1022 in specific reasoning metrics.
- Qwen2.5 Base: Built on the Qwen2.5 series, inheriting its foundational language understanding and generation capabilities.
- Extended Context: Supports a context length of 32768 tokens, suitable for processing longer inputs and complex problem descriptions.
Good For
- Reasoning-Intensive Applications: Ideal for tasks requiring strong logical deduction, problem-solving, and chain-of-thought generation.
- Math and Code Generation: Excels in mathematical problem-solving and code-related benchmarks, making it suitable for technical domains.
- Resource-Efficient Deployment: As a distilled model, it offers a more efficient alternative to larger models while retaining significant reasoning prowess, making it suitable for environments with computational constraints.
- Research and Development: Provides a valuable open-source resource for further research into model distillation and reasoning capabilities.