deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

Warm
Public
8B
FP8
32768
License: mit
Hugging Face
Overview

DeepSeek-R1-0528-Qwen3-8B: Enhanced Reasoning in a Compact Model

DeepSeek-R1-0528-Qwen3-8B is an 8 billion parameter model from DeepSeek-AI, representing a significant advancement in reasoning capabilities for smaller-scale language models. It is a distilled version of the more powerful DeepSeek-R1-0528, leveraging its chain-of-thought processes to enhance the Qwen3 8B base architecture. This model focuses on improving depth of reasoning and inference through algorithmic optimizations and increased computational resources during post-training.

Key Capabilities & Performance:

  • Superior Reasoning: Demonstrates outstanding performance in complex reasoning tasks across mathematics, programming, and general logic. For instance, it achieves 86.0% on AIME 2024, surpassing Qwen3 8B by 10.0% and matching Qwen3-235B-thinking.
  • Reduced Hallucination: Offers a lower hallucination rate compared to previous versions, leading to more reliable outputs.
  • Enhanced Function Calling: Provides improved support for function calling, making it more versatile for tool-use applications.
  • Code & Math Proficiency: Shows strong performance in coding benchmarks like LiveCodeBench (60.5%) and various math competitions (e.g., 76.3% on AIME 2025).
  • Qwen3 Compatibility: Shares the same model architecture as Qwen3-8B but utilizes the DeepSeek-R1-0528 tokenizer configuration.

Good for:

  • Academic Research: Particularly valuable for research into reasoning models and chain-of-thought distillation.
  • Industrial Development: Ideal for integrating advanced reasoning capabilities into small-scale applications where efficiency and performance are critical.
  • Complex Problem Solving: Excels in scenarios requiring deep logical inference, such as mathematical problem-solving and code generation.
  • Applications requiring Function Calling: Its enhanced function calling support makes it suitable for agentic workflows.