unsloth/DeepSeek-R1-Distill-Llama-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 20, 2025License:llama3.1Architecture:Transformer0.1K Warm

The DeepSeek-R1-Distill-Llama-8B is an 8 billion parameter language model developed by DeepSeek AI, distilled from the larger DeepSeek-R1 model and based on Llama-3.1-8B. It specializes in reasoning tasks, leveraging patterns learned from a 671B parameter model to achieve strong performance in math, code, and general reasoning. This model offers a 32,768 token context length and is designed to bring advanced reasoning capabilities to a smaller, more efficient architecture.

Loading preview...

DeepSeek-R1-Distill-Llama-8B Overview

This model is an 8 billion parameter language model from DeepSeek AI, part of their DeepSeek-R1-Distill series. It is a distilled version of the larger DeepSeek-R1 model, which was developed using large-scale reinforcement learning (RL) to enhance reasoning capabilities. DeepSeek-R1-Distill-Llama-8B specifically leverages reasoning patterns from the 671B parameter DeepSeek-R1, applying them to a Llama-3.1-8B base.

Key Capabilities

  • Enhanced Reasoning: Inherits advanced reasoning abilities from the DeepSeek-R1 parent model, which demonstrated self-verification, reflection, and long chain-of-thought (CoT) generation.
  • Distilled Performance: Achieves strong performance in math, code, and general reasoning benchmarks, often outperforming larger models in its class due to effective knowledge distillation.
  • Efficient Architecture: Provides powerful reasoning in a more compact 8B parameter size, making it suitable for applications where larger models might be impractical.
  • Extended Context: Supports a context length of 32,768 tokens, allowing for processing longer inputs and maintaining coherence over extended interactions.

Good For

  • Reasoning-intensive tasks: Excels in areas requiring logical deduction, problem-solving, and complex multi-step thinking.
  • Math and Code Applications: Demonstrates competitive performance in mathematical problem-solving and code-related benchmarks.
  • Resource-constrained environments: Offers a balance of strong reasoning capabilities and a relatively smaller parameter count, making it efficient for deployment.
  • Further Research and Fine-tuning: Can serve as a robust base for further fine-tuning on specific reasoning datasets, benefiting from its distilled knowledge.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p