DeepSeek-R1: Enhanced Reasoning Model
cminst/DSR17B-templatefixes is a 7.6 billion parameter model derived from DeepSeek-R1, developed by DeepSeek-AI. This model focuses on advanced reasoning capabilities, building upon the DeepSeek-R1 architecture which was trained using large-scale reinforcement learning (RL) without initial supervised fine-tuning (SFT) to foster complex chain-of-thought (CoT) exploration. DeepSeek-R1 addresses early challenges like repetition and poor readability by integrating cold-start data before the RL phase, significantly improving its reasoning performance.
Key Capabilities
- Advanced Reasoning: Excels in math, code, and general reasoning tasks, achieving performance comparable to OpenAI-o1.
- RL-Driven Development: Demonstrates that reasoning can be incentivized purely through RL, enabling self-verification and reflection.
- Distillation Potential: The DeepSeek-R1 framework supports distilling reasoning patterns into smaller models, leading to strong performance in more compact architectures.
- Extended Context: Features a 32768 token context length for handling longer and more complex inputs.
Usage Recommendations
- Optimal performance is achieved with a temperature between 0.5-0.7 (0.6 recommended).
- Avoid system prompts; integrate all instructions within the user prompt.
- For mathematical problems, include a directive like "Please reason step by step, and put your final answer within \boxed{}".
- Enforce the model to start responses with "\n" to ensure thorough reasoning.