unsloth/DeepSeek-R1-Distill-Qwen-7B

Warm
Public
7.6B
FP8
131072
License: apache-2.0
Hugging Face
Overview

Overview

DeepSeek-R1-Distill-Qwen-7B is a 7.6 billion parameter model developed by DeepSeek AI, part of a series of distilled models derived from the larger DeepSeek-R1. DeepSeek-R1 itself is a 671 billion total parameter (37 billion activated) Mixture-of-Experts (MoE) model trained via large-scale reinforcement learning (RL) to excel in reasoning tasks without initial supervised fine-tuning (SFT).

Key Capabilities

  • Reasoning Distillation: This model leverages reasoning patterns generated by the powerful DeepSeek-R1, transferring advanced analytical capabilities to a smaller, dense architecture. This approach aims to achieve strong reasoning performance in a more compact form factor.
  • Performance: The model shows competitive results on various benchmarks, including AIME 2024 (55.5 pass@1), MATH-500 (92.8 pass@1), and LiveCodeBench (37.6 pass@1), indicating proficiency in mathematical and coding reasoning.
  • Base Model: It is built upon the Qwen2.5-Math-7B architecture, inheriting its foundational language understanding and generation capabilities.
  • Context Length: Supports a substantial context length of 131,072 tokens, allowing for processing extensive inputs.

Good for

  • Reasoning-intensive applications: Ideal for tasks requiring strong logical deduction, problem-solving, and complex analytical thinking, particularly in mathematics and code generation.
  • Resource-constrained environments: As a distilled model, it offers a more efficient alternative to larger reasoning models while retaining significant capabilities.
  • Research and Development: Provides a robust base for further fine-tuning or experimentation in reasoning-focused LLM applications.