Name: unsloth/DeepSeek-R1-Distill-Qwen-14B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: unsloth

DeepSeek-R1-Distill-Qwen-14B Overview

This model is a 14 billion parameter distilled version of DeepSeek AI's DeepSeek-R1, built upon the Qwen2.5 architecture. DeepSeek-R1 itself is a 671 billion total parameter (37 billion activated) Mixture-of-Experts (MoE) model, developed through large-scale reinforcement learning (RL) to excel in reasoning tasks without initial supervised fine-tuning (SFT).

Key Capabilities

Reasoning Distillation: Leverages reasoning patterns from the larger DeepSeek-R1 model, demonstrating that complex reasoning can be effectively transferred to smaller models.
Enhanced Performance: Achieves strong results across various benchmarks, particularly in math (AIME 2024 pass@1: 69.7, MATH-500 pass@1: 93.9) and coding (LiveCodeBench pass@1: 53.1, CodeForces rating: 1481), often outperforming models like GPT-4o-0513 and Claude-3.5-Sonnet-1022 in specific reasoning metrics.
Qwen2.5 Base: Built on the Qwen2.5 series, inheriting its foundational language understanding and generation capabilities.
Extended Context: Supports a context length of 32768 tokens, suitable for processing longer inputs and complex problem descriptions.

Good For

Reasoning-Intensive Applications: Ideal for tasks requiring strong logical deduction, problem-solving, and chain-of-thought generation.
Math and Code Generation: Excels in mathematical problem-solving and code-related benchmarks, making it suitable for technical domains.
Resource-Efficient Deployment: As a distilled model, it offers a more efficient alternative to larger models while retaining significant reasoning prowess, making it suitable for environments with computational constraints.
Research and Development: Provides a valuable open-source resource for further research into model distillation and reasoning capabilities.

Overview

DeepSeek-R1-Distill-Qwen-14B Overview

Key Capabilities

Good For

Full Model Card (README)