Name: unsloth/DeepSeek-R1-Distill-Qwen-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: unsloth

Overview

DeepSeek-R1-Distill-Qwen-7B is a 7.6 billion parameter model developed by DeepSeek AI, part of a series of distilled models derived from the larger DeepSeek-R1. DeepSeek-R1 itself is a 671 billion total parameter (37 billion activated) Mixture-of-Experts (MoE) model trained via large-scale reinforcement learning (RL) to excel in reasoning tasks without initial supervised fine-tuning (SFT).

Key Capabilities

Reasoning Distillation: This model leverages reasoning patterns generated by the powerful DeepSeek-R1, transferring advanced analytical capabilities to a smaller, dense architecture. This approach aims to achieve strong reasoning performance in a more compact form factor.
Performance: The model shows competitive results on various benchmarks, including AIME 2024 (55.5 pass@1), MATH-500 (92.8 pass@1), and LiveCodeBench (37.6 pass@1), indicating proficiency in mathematical and coding reasoning.
Base Model: It is built upon the Qwen2.5-Math-7B architecture, inheriting its foundational language understanding and generation capabilities.
Context Length: Supports a substantial context length of 131,072 tokens, allowing for processing extensive inputs.

Good for

Reasoning-intensive applications: Ideal for tasks requiring strong logical deduction, problem-solving, and complex analytical thinking, particularly in mathematics and code generation.
Resource-constrained environments: As a distilled model, it offers a more efficient alternative to larger reasoning models while retaining significant capabilities.
Research and Development: Provides a robust base for further fine-tuning or experimentation in reasoning-focused LLM applications.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)