Name: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: deepseek-ai

Overview

DeepSeek-R1-Distill-Qwen-32B is a 32.8 billion parameter model from DeepSeek-AI, part of the DeepSeek-R1 series. It is a distilled version of the larger DeepSeek-R1 model, built upon the Qwen2.5 architecture, and designed to transfer the advanced reasoning capabilities of its larger counterpart into a more compact form. This model is notable for its training methodology, which involves distillation from a model (DeepSeek-R1) that was developed using large-scale reinforcement learning (RL) to discover and enhance reasoning patterns without initial supervised fine-tuning (SFT).

Key Capabilities

Enhanced Reasoning: Benefits from reasoning patterns distilled from DeepSeek-R1, which itself was trained to excel in complex problem-solving through RL.
Strong Performance in Math & Code: Achieves competitive results on benchmarks like AIME 2024 (72.6% pass@1), MATH-500 (94.3% pass@1), and LiveCodeBench (57.2% pass@1), often surpassing models like OpenAI-o1-mini.
Long Context Understanding: Supports a substantial context length of 131072 tokens, enabling processing of extensive inputs.
Distilled Efficiency: Demonstrates that smaller models can achieve high reasoning performance when effectively distilled from larger, specialized models.

When to Use This Model

Complex Reasoning Tasks: Ideal for applications requiring advanced logical deduction, problem-solving, and multi-step reasoning.
Mathematical and Coding Challenges: Particularly well-suited for tasks involving mathematical problem-solving and code generation/understanding.
Resource-Constrained Environments: Offers a powerful reasoning engine in a 32.8B parameter size, making it more accessible than much larger models while retaining high performance.
Research and Development: Useful for researchers exploring distillation techniques and the transfer of reasoning capabilities from large RL-trained models to smaller, dense architectures.

Overview

Overview

Key Capabilities

When to Use This Model

Full Model Card (README)