unsloth/DeepSeek-R1-Distill-Llama-70B

Warm
Public
70B
FP8
32768
License: mit
Hugging Face
Overview

DeepSeek-R1-Distill-Llama-70B Overview

DeepSeek-R1-Distill-Llama-70B is a 70 billion parameter model from DeepSeek AI, part of their DeepSeek-R1 series. It is a distilled version of the larger DeepSeek-R1 model, fine-tuned on reasoning data generated by DeepSeek-R1 itself, and built upon the Llama-3.3-70B-Instruct base. This distillation process aims to imbue smaller, dense models with the advanced reasoning capabilities of their larger counterparts.

Key Capabilities & Features

  • Reasoning Transfer: Leverages reasoning patterns discovered by the 671B parameter DeepSeek-R1, which was developed using large-scale reinforcement learning (RL) without initial supervised fine-tuning (SFT).
  • Strong Performance: Achieves competitive results across various benchmarks, including AIME 2024 (70.0 pass@1), MATH-500 (94.5 pass@1), GPQA Diamond (65.2 pass@1), and LiveCodeBench (57.5 pass@1).
  • Llama-Based: Built on the Llama-3.3-70B-Instruct architecture, making it compatible with existing Llama workflows and tools like vLLM and SGLang.
  • High Context Length: Supports a context length of 32,768 tokens, suitable for processing extensive inputs.

When to Use This Model

This model is particularly well-suited for applications requiring strong analytical and problem-solving skills. Consider DeepSeek-R1-Distill-Llama-70B for:

  • Mathematical Reasoning: Excels in complex math problems, as indicated by high scores on AIME and MATH-500.
  • Code Generation & Analysis: Demonstrates robust performance in coding benchmarks like LiveCodeBench.
  • General Reasoning Tasks: Capable of handling intricate reasoning challenges across various domains.
  • Deployment with Llama Ecosystem: Ideal for developers already working with Llama-based models due to its architectural foundation.