deepseek-ai/DeepSeek-R1-0528

5.0 based on 2 reviews
Cold
Public
685B
FP8
32768
License: mit
Hugging Face
Overview

DeepSeek-R1-0528: Enhanced Reasoning and Inference

DeepSeek-R1-0528 is an upgraded version of the DeepSeek R1 model by DeepSeek AI, focusing on significantly improved reasoning and inference capabilities. This 685 billion parameter model leverages increased computational resources and algorithmic optimizations during post-training to achieve performance approaching leading models like O3 and Gemini 2.5 Pro.

Key Capabilities and Improvements

  • Enhanced Reasoning Depth: Demonstrates substantial improvements in handling complex reasoning tasks, evidenced by an increase in AIME 2025 test accuracy from 70% to 87.5%. The model now uses an average of 23K tokens per question for deeper thought processes, up from 12K.
  • Reduced Hallucination: Offers a lower hallucination rate compared to its previous version.
  • Improved Function Calling: Provides enhanced support for function calling.
  • Vibe Coding Experience: Delivers a better experience for "vibe coding."
  • Benchmark Performance: Shows strong performance across various benchmarks, including MMLU-Redux (93.4), GPQA-Diamond (81.0), LiveCodeBench (73.3), and AIME 2025 (87.5).
  • Distillation for Smaller Models: The chain-of-thought from DeepSeek-R1-0528 has been used to post-train DeepSeek-R1-0528-Qwen3-8B, achieving state-of-the-art performance among open-source models on AIME 2024.

Usage Notes

  • Supports system prompts.
  • No longer requires the "\n" prefix to activate thinking patterns.
  • Maximum generation length is 64K tokens for evaluations.

This model is suitable for applications demanding advanced logical reasoning, mathematical problem-solving, and robust code generation.