Sherckuith/DeepSeek-R1-Distill-Llama-70B

TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kPublished:Apr 18, 2026License:mitArchitecture:Transformer Open Weights Cold

DeepSeek-R1-Distill-Llama-70B is a 70 billion parameter language model developed by DeepSeek-AI, distilled from the DeepSeek-R1 reasoning model and based on Llama-3.3-70B-Instruct. This model is specifically fine-tuned using reasoning data generated by DeepSeek-R1, aiming to transfer advanced reasoning patterns to a smaller, dense architecture. It excels in complex reasoning, mathematical, and coding tasks, demonstrating strong performance across various benchmarks.

Loading preview...

DeepSeek-R1-Distill-Llama-70B: Reasoning Distillation

DeepSeek-R1-Distill-Llama-70B is a 70 billion parameter model developed by DeepSeek-AI, part of a series of models distilled from the larger DeepSeek-R1. DeepSeek-R1 itself is a first-generation reasoning model trained primarily through large-scale reinforcement learning (RL) to develop advanced reasoning capabilities without initial supervised fine-tuning (SFT).

Key Capabilities & Differentiators

  • Reasoning Distillation: This model is fine-tuned using reasoning data generated by the powerful DeepSeek-R1, demonstrating that complex reasoning patterns from larger models can be effectively transferred to smaller, dense architectures.
  • Strong Performance: Benchmarks show this distilled model achieves competitive results in math (AIME 2024 pass@1: 70.0, MATH-500 pass@1: 94.5), code (LiveCodeBench pass@1: 57.5, CodeForces rating: 1633), and general reasoning tasks (GPQA Diamond pass@1: 65.2).
  • Llama-3.3 Base: Built upon the Llama-3.3-70B-Instruct base model, leveraging its robust foundation.
  • Context Length: Supports a context length of 32,768 tokens.

Usage Recommendations

  • Prompting: Avoid system prompts; include all instructions within the user prompt. For mathematical problems, include a directive like "Please reason step by step, and put your final answer within \boxed{}".
  • Reasoning Enforcement: To ensure thorough reasoning, it is recommended to enforce the model to start its response with "\n" at the beginning of every output.
  • Temperature: Set temperature between 0.5-0.7 (0.6 recommended) to prevent repetitive or incoherent outputs.

This model is ideal for applications requiring strong reasoning, mathematical problem-solving, and code generation, especially when seeking to leverage advanced reasoning capabilities in a Llama-based architecture.