deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Jan 20, 2025License:mitArchitecture:Transformer1.5K Open Weights Warm

DeepSeek-R1-Distill-Qwen-32B is a 32.8 billion parameter language model developed by DeepSeek-AI, distilled from the larger DeepSeek-R1 model and based on the Qwen2.5 architecture. It is specifically fine-tuned using reasoning data generated by DeepSeek-R1, excelling in complex reasoning, mathematical, and coding tasks with a context length of 131072 tokens. This model demonstrates strong performance across various benchmarks, often outperforming larger models in its class due to its specialized distillation process.

Loading preview...

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p