Name: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: deepseek-ai

Model Overview

DeepSeek-R1-Distill-Qwen-14B is a 14.8 billion parameter language model from DeepSeek-AI, part of their DeepSeek-R1 series. This model is a distillation of the larger DeepSeek-R1, which was developed using large-scale reinforcement learning (RL) to enhance reasoning capabilities without initial supervised fine-tuning (SFT). The distillation process transfers the advanced reasoning patterns of DeepSeek-R1 into this smaller, dense model, built upon the Qwen2.5 architecture.

Key Capabilities

Enhanced Reasoning: Benefits from reasoning patterns distilled from DeepSeek-R1, which demonstrated capabilities like self-verification and reflection.
Strong Performance: Achieves competitive results across various benchmarks, particularly in math (AIME 2024 pass@1: 69.7, MATH-500 pass@1: 93.9) and coding (LiveCodeBench pass@1: 53.1, CodeForces rating: 1481).
Long Context: Supports a context length of 131,072 tokens, enabling processing of extensive inputs.
Distilled Efficiency: Offers powerful reasoning in a more compact form factor compared to its larger parent model.

Usage Recommendations

Prompting: Avoid system prompts; all instructions should be within the user prompt. For mathematical problems, include a directive like "Please reason step by step, and put your final answer within \boxed{}".
Temperature: Recommended temperature range is 0.5-0.7 (0.6 is ideal) to prevent repetitive or incoherent outputs.
Enforced Reasoning: To ensure thorough reasoning, it's recommended to enforce the model to start its response with "\n".

Overview

Model Overview

Key Capabilities

Usage Recommendations

Full Model Card (README)