Ujjwal-Tyagi/DeepSeek-R1-Distill-Qwen-32B

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Mar 29, 2026License:mitArchitecture:Transformer Open Weights Cold

DeepSeek-R1-Distill-Qwen-32B is a 32.8 billion parameter distilled language model developed by DeepSeek-AI, based on Qwen2.5-32B. It is fine-tuned using reasoning data generated by the larger DeepSeek-R1 model, excelling in mathematical, coding, and general reasoning tasks. This model demonstrates that powerful reasoning patterns can be effectively transferred to smaller, dense models, offering strong performance comparable to larger proprietary models.

Loading preview...

Overview

DeepSeek-R1-Distill-Qwen-32B is a 32.8 billion parameter model from DeepSeek-AI, part of their DeepSeek-R1-Distill series. This model is a distillation of the larger DeepSeek-R1, which was developed using large-scale reinforcement learning (RL) to enhance reasoning capabilities without initial supervised fine-tuning (SFT). The distillation process transfers the advanced reasoning patterns of DeepSeek-R1 into smaller, dense models like this Qwen-based variant.

Key Capabilities

  • Enhanced Reasoning: Benefits from reasoning patterns discovered by the larger DeepSeek-R1 model, which was trained to explore chain-of-thought (CoT) for complex problem-solving.
  • Strong Performance: Achieves competitive results across various benchmarks, particularly in math, code, and general reasoning tasks, outperforming some larger models in specific areas.
  • Distilled Efficiency: Demonstrates that powerful reasoning can be effectively distilled into smaller models, making high-performance reasoning more accessible.
  • Context Length: Supports a substantial context length of 32,768 tokens.

Good For

  • Reasoning-intensive applications: Ideal for tasks requiring logical deduction, problem-solving, and multi-step thinking.
  • Mathematical and Coding tasks: Shows strong performance in benchmarks like AIME 2024, MATH-500, LiveCodeBench, and Codeforces.
  • Resource-constrained environments: Offers powerful reasoning capabilities in a 32.8B parameter dense model, potentially more efficient than much larger sparse models.
  • Research and Development: Provides a strong base for further research into model distillation and reasoning enhancement.