Neura-Tech-AI/DeepSeek-R1-Distill-Qwen-32B

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Mar 29, 2026License:mitArchitecture:Transformer Open Weights Cold

Neura-Tech-AI/DeepSeek-R1-Distill-Qwen-32B is a 32.8 billion parameter distilled language model developed by DeepSeek-AI, based on the Qwen2.5 architecture with a 32768 token context length. It is fine-tuned using reasoning data generated by the larger DeepSeek-R1 model, demonstrating enhanced performance on math, code, and reasoning benchmarks. This model is optimized for achieving strong reasoning capabilities in a smaller, dense model footprint.

Loading preview...

Overview

DeepSeek-R1-Distill-Qwen-32B is a 32.8 billion parameter model from DeepSeek-AI, distilled from the larger DeepSeek-R1 reasoning model. It leverages reasoning patterns discovered through large-scale reinforcement learning (RL) on the DeepSeek-R1, which was developed without initial supervised fine-tuning (SFT) to foster emergent reasoning behaviors. This distillation process allows smaller, dense models to achieve performance comparable to or exceeding larger models on specific tasks.

Key Capabilities

  • Enhanced Reasoning: Benefits from reasoning data generated by DeepSeek-R1, which excels in complex problem-solving across math, code, and general reasoning tasks.
  • Strong Benchmark Performance: Outperforms OpenAI-o1-mini and other models in its size class on various benchmarks, including AIME 2024 (72.6 pass@1), MATH-500 (94.3 pass@1), and LiveCodeBench (57.2 pass@1).
  • Efficient Architecture: A dense model based on Qwen2.5, offering strong performance with 32.8 billion parameters and a 32768 token context length.

Good For

  • Reasoning-intensive applications: Ideal for tasks requiring robust logical deduction, mathematical problem-solving, and code generation.
  • Resource-constrained environments: Provides high reasoning capabilities in a more compact model size compared to very large MoE models.
  • Research and Development: Suitable for further distillation experiments or as a strong base for fine-tuning on specific reasoning datasets.