Overview
DeepSeek-R1-0528-Qwen3-8B Overview
This model is an 8 billion parameter variant, distilled from the advanced DeepSeek-R1-0528 model's chain-of-thought, and built upon the Qwen3-8B architecture. It leverages algorithmic optimizations and increased computational resources to significantly improve reasoning and inference capabilities, especially in complex domains like mathematics and programming. The model maintains a substantial context length of 32768 tokens.
Key Capabilities
- Enhanced Reasoning: Demonstrates substantial improvements in handling complex reasoning tasks, with accuracy increases on benchmarks like AIME 2025 (from 70% to 87.5% for the base DeepSeek-R1-0528). The distilled Qwen3-8B version also shows strong performance, matching Qwen3-235B-thinking on AIME 2024.
- Reduced Hallucination: Offers a lower hallucination rate compared to previous versions.
- Improved Function Calling & Code Generation: Provides enhanced support for function calling and a better experience for "vibe coding."
- Benchmark Performance: Achieves state-of-the-art performance among open-source models on AIME 2024, surpassing Qwen3 8B by +10.0%.
Good For
- Complex Problem Solving: Ideal for applications requiring deep reasoning, such as mathematical problem-solving and logical inference.
- Code-Related Tasks: Suitable for code generation, debugging, and other programming-centric use cases.
- Academic Research: The distilled chain-of-thought is highlighted as significant for academic research on reasoning models and small-scale model development.
This model is licensed under the MIT License, supporting commercial use and distillation.