unsloth/DeepSeek-R1-0528-Qwen3-8B

Warm
Public
8B
FP8
32768
License: mit
Hugging Face
Overview

DeepSeek-R1-0528-Qwen3-8B Overview

This model is an 8 billion parameter variant, distilled from the advanced DeepSeek-R1-0528 model's chain-of-thought, and built upon the Qwen3-8B architecture. It leverages algorithmic optimizations and increased computational resources to significantly improve reasoning and inference capabilities, especially in complex domains like mathematics and programming. The model maintains a substantial context length of 32768 tokens.

Key Capabilities

  • Enhanced Reasoning: Demonstrates substantial improvements in handling complex reasoning tasks, with accuracy increases on benchmarks like AIME 2025 (from 70% to 87.5% for the base DeepSeek-R1-0528). The distilled Qwen3-8B version also shows strong performance, matching Qwen3-235B-thinking on AIME 2024.
  • Reduced Hallucination: Offers a lower hallucination rate compared to previous versions.
  • Improved Function Calling & Code Generation: Provides enhanced support for function calling and a better experience for "vibe coding."
  • Benchmark Performance: Achieves state-of-the-art performance among open-source models on AIME 2024, surpassing Qwen3 8B by +10.0%.

Good For

  • Complex Problem Solving: Ideal for applications requiring deep reasoning, such as mathematical problem-solving and logical inference.
  • Code-Related Tasks: Suitable for code generation, debugging, and other programming-centric use cases.
  • Academic Research: The distilled chain-of-thought is highlighted as significant for academic research on reasoning models and small-scale model development.

This model is licensed under the MIT License, supporting commercial use and distillation.