deepseek-ai/DeepSeek-R1-Distill-Llama-70B
Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kPublished:Jan 20, 2025License:mitArchitecture:Transformer0.8K Open Weights Warm

DeepSeek-R1-Distill-Llama-70B is a 70 billion parameter language model developed by DeepSeek-AI, distilled from the larger DeepSeek-R1 model and based on the Llama-3.3-70B-Instruct architecture. This model is specifically fine-tuned using reasoning data generated by DeepSeek-R1, excelling in complex reasoning, mathematical, and coding tasks. It features a 32,768 token context length and demonstrates strong performance on benchmarks like AIME 2024 and MATH-500, making it suitable for applications requiring advanced problem-solving capabilities.

Loading preview...

DeepSeek-R1-Distill-Llama-70B: Reasoning-Enhanced Language Model

DeepSeek-R1-Distill-Llama-70B is a 70 billion parameter model from DeepSeek-AI, part of their DeepSeek-R1 series focused on advanced reasoning. This model is a distillation of the larger DeepSeek-R1, which itself was developed using large-scale reinforcement learning (RL) directly on a base model, without initial supervised fine-tuning (SFT), to foster complex reasoning behaviors like self-verification and reflection.

Key Capabilities & Features

  • Reasoning Distillation: Leverages reasoning patterns from the powerful DeepSeek-R1 model, enabling smaller models to achieve superior performance in reasoning tasks compared to direct RL on smaller architectures.
  • Strong Performance: Achieves competitive results across various benchmarks, including:
    • AIME 2024 (Pass@1): 70.0
    • MATH-500 (Pass@1): 94.5
    • GPQA Diamond (Pass@1): 65.2
    • LiveCodeBench (Pass@1): 57.5
  • Llama-Based Architecture: Built upon the Llama-3.3-70B-Instruct model, ensuring a familiar and robust foundation.
  • Extended Context Length: Supports a context window of 32,768 tokens, beneficial for handling longer and more complex inputs.

Usage Recommendations

  • Optimal Settings: For best performance, use a temperature between 0.5-0.7 (0.6 recommended) and avoid system prompts, placing all instructions within the user prompt.
  • Reasoning Prompts: For mathematical problems, include directives like "Please reason step by step, and put your final answer within \boxed{}".
  • Enforced Reasoning: To ensure thorough reasoning, it's recommended to enforce the model to start its response with "\n" to prevent it from bypassing its thinking process.

Good For

  • Applications requiring advanced mathematical problem-solving.
  • Complex code generation and analysis tasks.
  • Scenarios demanding robust logical reasoning and chain-of-thought capabilities.
  • Research and development in distilling large model capabilities into more manageable sizes.
Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p