Name: deepseek-ai/DeepSeek-R1-Distill-Llama-70B API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: deepseek-ai

DeepSeek-R1-Distill-Llama-70B: Reasoning-Enhanced Language Model

DeepSeek-R1-Distill-Llama-70B is a 70 billion parameter model from DeepSeek-AI, part of their DeepSeek-R1 series focused on advanced reasoning. This model is a distillation of the larger DeepSeek-R1, which itself was developed using large-scale reinforcement learning (RL) directly on a base model, without initial supervised fine-tuning (SFT), to foster complex reasoning behaviors like self-verification and reflection.

Key Capabilities & Features

Reasoning Distillation: Leverages reasoning patterns from the powerful DeepSeek-R1 model, enabling smaller models to achieve superior performance in reasoning tasks compared to direct RL on smaller architectures.
Strong Performance: Achieves competitive results across various benchmarks, including:
- AIME 2024 (Pass@1): 70.0
- MATH-500 (Pass@1): 94.5
- GPQA Diamond (Pass@1): 65.2
- LiveCodeBench (Pass@1): 57.5
Llama-Based Architecture: Built upon the Llama-3.3-70B-Instruct model, ensuring a familiar and robust foundation.
Extended Context Length: Supports a context window of 32,768 tokens, beneficial for handling longer and more complex inputs.

Usage Recommendations

Optimal Settings: For best performance, use a temperature between 0.5-0.7 (0.6 recommended) and avoid system prompts, placing all instructions within the user prompt.
Reasoning Prompts: For mathematical problems, include directives like "Please reason step by step, and put your final answer within \boxed{}".
Enforced Reasoning: To ensure thorough reasoning, it's recommended to enforce the model to start its response with "\n" to prevent it from bypassing its thinking process.

Good For

Applications requiring advanced mathematical problem-solving.
Complex code generation and analysis tasks.
Scenarios demanding robust logical reasoning and chain-of-thought capabilities.
Research and development in distilling large model capabilities into more manageable sizes.