unsloth/DeepSeek-R1-Distill-Qwen-7B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jan 20, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

DeepSeek-R1-Distill-Qwen-7B is a 7.6 billion parameter language model developed by DeepSeek AI, distilled from the larger DeepSeek-R1 reasoning model and based on the Qwen2.5-Math-7B architecture. This model is specifically fine-tuned using reasoning patterns generated by DeepSeek-R1, aiming to transfer advanced reasoning capabilities to a smaller, more efficient dense model. It demonstrates strong performance across mathematical, coding, and general reasoning benchmarks, making it suitable for applications requiring robust analytical problem-solving.

Loading preview...

Overview

DeepSeek-R1-Distill-Qwen-7B is a 7.6 billion parameter model developed by DeepSeek AI, part of a series of distilled models derived from the larger DeepSeek-R1. DeepSeek-R1 itself is a 671 billion total parameter (37 billion activated) Mixture-of-Experts (MoE) model trained via large-scale reinforcement learning (RL) to excel in reasoning tasks without initial supervised fine-tuning (SFT).

Key Capabilities

  • Reasoning Distillation: This model leverages reasoning patterns generated by the powerful DeepSeek-R1, transferring advanced analytical capabilities to a smaller, dense architecture. This approach aims to achieve strong reasoning performance in a more compact form factor.
  • Performance: The model shows competitive results on various benchmarks, including AIME 2024 (55.5 pass@1), MATH-500 (92.8 pass@1), and LiveCodeBench (37.6 pass@1), indicating proficiency in mathematical and coding reasoning.
  • Base Model: It is built upon the Qwen2.5-Math-7B architecture, inheriting its foundational language understanding and generation capabilities.
  • Context Length: Supports a substantial context length of 131,072 tokens, allowing for processing extensive inputs.

Good for

  • Reasoning-intensive applications: Ideal for tasks requiring strong logical deduction, problem-solving, and complex analytical thinking, particularly in mathematics and code generation.
  • Resource-constrained environments: As a distilled model, it offers a more efficient alternative to larger reasoning models while retaining significant capabilities.
  • Research and Development: Provides a robust base for further fine-tuning or experimentation in reasoning-focused LLM applications.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p