DeepSeek-R1: A Reasoning-Focused LLM

DeepSeek-R1 is a 70 billion parameter model from DeepSeek-AI, distinguished by its novel approach to developing reasoning capabilities primarily through large-scale reinforcement learning (RL). Unlike traditional methods that heavily rely on supervised fine-tuning (SFT) initially, DeepSeek-R1-Zero demonstrated that reasoning can emerge purely from RL. DeepSeek-R1 further refines this by incorporating cold-start data and a two-stage RL and SFT pipeline to enhance performance and address issues like repetition and poor readability.

Key Capabilities & Innovations

RL-Driven Reasoning: Validates that complex reasoning behaviors, including self-verification and reflection, can be incentivized through RL without initial SFT.
Performance: Achieves strong results across math, code, and general reasoning benchmarks, with DeepSeek-R1 showing performance comparable to OpenAI-o1.
Distillation: DeepSeek-AI has also open-sourced smaller, distilled models (DeepSeek-R1-Distill) that leverage the reasoning patterns of DeepSeek-R1, demonstrating that smaller models can achieve high performance when guided by larger, more capable models.

Usage Recommendations

Temperature: Recommended between 0.5-0.7 (0.6 for optimal results).
Prompting: Avoid system prompts; include all instructions within the user prompt.
Mathematical Tasks: Advised to include "Please reason step by step, and put your final answer within \boxed{}" in prompts.
Enforce Reasoning: To ensure thorough reasoning, enforce the model to start its response with "\n".

Overview

DeepSeek-R1: A Reasoning-Focused LLM

Key Capabilities & Innovations

Usage Recommendations

Full Model Card (README)