DeepSeek-R1-0528-Qwen3-8B is an 8 billion parameter language model developed by DeepSeek-AI, distilled from the DeepSeek-R1-0528 model's chain-of-thought and built upon the Qwen3 8B architecture. This model is specifically optimized for enhanced reasoning and inference capabilities, demonstrating significant improvements in mathematics, programming, and general logic tasks. It achieves state-of-the-art performance among open-source models on benchmarks like AIME 2024, making it suitable for complex problem-solving applications.
Loading preview...
DeepSeek-R1-0528-Qwen3-8B: Enhanced Reasoning in a Compact Model
DeepSeek-R1-0528-Qwen3-8B is an 8 billion parameter model from DeepSeek-AI, representing a significant advancement in reasoning capabilities for smaller-scale language models. It is a distilled version of the more powerful DeepSeek-R1-0528, leveraging its chain-of-thought processes to enhance the Qwen3 8B base architecture. This model focuses on improving depth of reasoning and inference through algorithmic optimizations and increased computational resources during post-training.
Key Capabilities & Performance:
- Superior Reasoning: Demonstrates outstanding performance in complex reasoning tasks across mathematics, programming, and general logic. For instance, it achieves 86.0% on AIME 2024, surpassing Qwen3 8B by 10.0% and matching Qwen3-235B-thinking.
- Reduced Hallucination: Offers a lower hallucination rate compared to previous versions, leading to more reliable outputs.
- Enhanced Function Calling: Provides improved support for function calling, making it more versatile for tool-use applications.
- Code & Math Proficiency: Shows strong performance in coding benchmarks like LiveCodeBench (60.5%) and various math competitions (e.g., 76.3% on AIME 2025).
- Qwen3 Compatibility: Shares the same model architecture as Qwen3-8B but utilizes the DeepSeek-R1-0528 tokenizer configuration.
Good for:
- Academic Research: Particularly valuable for research into reasoning models and chain-of-thought distillation.
- Industrial Development: Ideal for integrating advanced reasoning capabilities into small-scale applications where efficiency and performance are critical.
- Complex Problem Solving: Excels in scenarios requiring deep logical inference, such as mathematical problem-solving and code generation.
- Applications requiring Function Calling: Its enhanced function calling support makes it suitable for agentic workflows.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.