unsloth/DeepSeek-R1-0528-Qwen3-8B
DeepSeek-R1-0528-Qwen3-8B is an 8 billion parameter language model developed by DeepSeek AI, based on the Qwen3 architecture with a 32768 token context length. This model is distilled from the DeepSeek-R1-0528 model's chain-of-thought, significantly enhancing its reasoning capabilities, particularly in mathematics and programming. It achieves state-of-the-art performance among open-source models on benchmarks like AIME 2024, making it suitable for complex reasoning tasks and code generation.
Loading preview...
DeepSeek-R1-0528-Qwen3-8B Overview
This model is an 8 billion parameter variant, distilled from the advanced DeepSeek-R1-0528 model's chain-of-thought, and built upon the Qwen3-8B architecture. It leverages algorithmic optimizations and increased computational resources to significantly improve reasoning and inference capabilities, especially in complex domains like mathematics and programming. The model maintains a substantial context length of 32768 tokens.
Key Capabilities
- Enhanced Reasoning: Demonstrates substantial improvements in handling complex reasoning tasks, with accuracy increases on benchmarks like AIME 2025 (from 70% to 87.5% for the base DeepSeek-R1-0528). The distilled Qwen3-8B version also shows strong performance, matching Qwen3-235B-thinking on AIME 2024.
- Reduced Hallucination: Offers a lower hallucination rate compared to previous versions.
- Improved Function Calling & Code Generation: Provides enhanced support for function calling and a better experience for "vibe coding."
- Benchmark Performance: Achieves state-of-the-art performance among open-source models on AIME 2024, surpassing Qwen3 8B by +10.0%.
Good For
- Complex Problem Solving: Ideal for applications requiring deep reasoning, such as mathematical problem-solving and logical inference.
- Code-Related Tasks: Suitable for code generation, debugging, and other programming-centric use cases.
- Academic Research: The distilled chain-of-thought is highlighted as significant for academic research on reasoning models and small-scale model development.
This model is licensed under the MIT License, supporting commercial use and distillation.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.