deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

Warm
Public
32.8B
FP8
131072
License: mit
Hugging Face
Overview

Overview

DeepSeek-R1-Distill-Qwen-32B is a 32.8 billion parameter model from DeepSeek-AI, part of the DeepSeek-R1 series. It is a distilled version of the larger DeepSeek-R1 model, built upon the Qwen2.5 architecture, and designed to transfer the advanced reasoning capabilities of its larger counterpart into a more compact form. This model is notable for its training methodology, which involves distillation from a model (DeepSeek-R1) that was developed using large-scale reinforcement learning (RL) to discover and enhance reasoning patterns without initial supervised fine-tuning (SFT).

Key Capabilities

  • Enhanced Reasoning: Benefits from reasoning patterns distilled from DeepSeek-R1, which itself was trained to excel in complex problem-solving through RL.
  • Strong Performance in Math & Code: Achieves competitive results on benchmarks like AIME 2024 (72.6% pass@1), MATH-500 (94.3% pass@1), and LiveCodeBench (57.2% pass@1), often surpassing models like OpenAI-o1-mini.
  • Long Context Understanding: Supports a substantial context length of 131072 tokens, enabling processing of extensive inputs.
  • Distilled Efficiency: Demonstrates that smaller models can achieve high reasoning performance when effectively distilled from larger, specialized models.

When to Use This Model

  • Complex Reasoning Tasks: Ideal for applications requiring advanced logical deduction, problem-solving, and multi-step reasoning.
  • Mathematical and Coding Challenges: Particularly well-suited for tasks involving mathematical problem-solving and code generation/understanding.
  • Resource-Constrained Environments: Offers a powerful reasoning engine in a 32.8B parameter size, making it more accessible than much larger models while retaining high performance.
  • Research and Development: Useful for researchers exploring distillation techniques and the transfer of reasoning capabilities from large RL-trained models to smaller, dense architectures.