unsloth/DeepSeek-R1-Distill-Llama-70B

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kPublished:Jan 20, 2025License:mitArchitecture:Transformer0.0K Open Weights Warm

DeepSeek-R1-Distill-Llama-70B is a 70 billion parameter language model developed by DeepSeek AI, distilled from the larger DeepSeek-R1 model and based on Llama-3.3-70B-Instruct. This model is specifically fine-tuned using reasoning data generated by DeepSeek-R1, aiming to transfer advanced reasoning patterns to a smaller, dense architecture. It excels in complex reasoning tasks across math, code, and general English benchmarks, offering strong performance for applications requiring robust analytical capabilities.

Loading preview...

DeepSeek-R1-Distill-Llama-70B Overview

DeepSeek-R1-Distill-Llama-70B is a 70 billion parameter model from DeepSeek AI, part of their DeepSeek-R1 series. It is a distilled version of the larger DeepSeek-R1 model, fine-tuned on reasoning data generated by DeepSeek-R1 itself, and built upon the Llama-3.3-70B-Instruct base. This distillation process aims to imbue smaller, dense models with the advanced reasoning capabilities of their larger counterparts.

Key Capabilities & Features

  • Reasoning Transfer: Leverages reasoning patterns discovered by the 671B parameter DeepSeek-R1, which was developed using large-scale reinforcement learning (RL) without initial supervised fine-tuning (SFT).
  • Strong Performance: Achieves competitive results across various benchmarks, including AIME 2024 (70.0 pass@1), MATH-500 (94.5 pass@1), GPQA Diamond (65.2 pass@1), and LiveCodeBench (57.5 pass@1).
  • Llama-Based: Built on the Llama-3.3-70B-Instruct architecture, making it compatible with existing Llama workflows and tools like vLLM and SGLang.
  • High Context Length: Supports a context length of 32,768 tokens, suitable for processing extensive inputs.

When to Use This Model

This model is particularly well-suited for applications requiring strong analytical and problem-solving skills. Consider DeepSeek-R1-Distill-Llama-70B for:

  • Mathematical Reasoning: Excels in complex math problems, as indicated by high scores on AIME and MATH-500.
  • Code Generation & Analysis: Demonstrates robust performance in coding benchmarks like LiveCodeBench.
  • General Reasoning Tasks: Capable of handling intricate reasoning challenges across various domains.
  • Deployment with Llama Ecosystem: Ideal for developers already working with Llama-based models due to its architectural foundation.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p