unsloth/DeepSeek-R1-Distill-Qwen-32B
Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kPublished:Jan 20, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The DeepSeek-R1-Distill-Qwen-32B model, developed by DeepSeek AI, is a 32 billion parameter language model distilled from the larger DeepSeek-R1 reasoning model and based on the Qwen2.5 architecture. It is specifically optimized for complex reasoning, mathematical, and coding tasks, demonstrating strong performance across various benchmarks. This model leverages advanced distillation techniques to transfer the reasoning capabilities of a larger model into a more compact form, making it suitable for applications requiring high-level cognitive abilities.

Loading preview...

DeepSeek-R1-Distill-Qwen-32B Overview

DeepSeek-R1-Distill-Qwen-32B is a 32 billion parameter model from DeepSeek AI, distilled from their larger DeepSeek-R1 reasoning model and built upon the Qwen2.5 base. This model is part of a series designed to transfer the advanced reasoning patterns of large-scale models into more efficient, smaller architectures. DeepSeek-R1 itself was developed using a novel reinforcement learning (RL) approach, initially without supervised fine-tuning (SFT), to foster emergent reasoning behaviors like self-verification and chain-of-thought generation.

Key Capabilities

  • Advanced Reasoning: Excels in complex reasoning tasks, inheriting capabilities from the DeepSeek-R1 parent model.
  • Mathematical Proficiency: Achieves high scores on benchmarks like AIME 2024 (72.6% pass@1) and MATH-500 (94.3% pass@1).
  • Code Generation: Demonstrates strong performance in coding challenges, with a LiveCodeBench pass@1 of 57.2% and a CodeForces rating of 1691.
  • Distilled Performance: Outperforms OpenAI-o1-mini in several benchmarks, showcasing the effectiveness of its distillation process.
  • Extended Context: Supports a context length of 32,768 tokens.

When to Use This Model

This model is particularly well-suited for applications requiring robust reasoning, mathematical problem-solving, and code generation. Its distilled nature makes it a powerful option for scenarios where the performance of larger models is needed but with the efficiency benefits of a 32B parameter model. It is recommended for tasks demanding high accuracy in complex cognitive domains.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p