Overview
DeepSeek-R1-Distill-Qwen-1.5B Overview
DeepSeek-R1-Distill-Qwen-1.5B is a 1.5 billion parameter model from DeepSeek-AI, part of the DeepSeek-R1 series. This model is a distilled version of the larger DeepSeek-R1, fine-tuned on reasoning data generated by its more powerful counterpart. It leverages the Qwen2.5-Math-1.5B base model and is designed to demonstrate that advanced reasoning capabilities can be effectively transferred to smaller, more efficient models.
Key Capabilities
- Reasoning Performance: Achieves strong performance across mathematical, coding, and general reasoning benchmarks, including AIME 2024 (28.9 pass@1), MATH-500 (83.9 pass@1), and CodeForces (954 rating).
- Distilled Intelligence: Benefits from reasoning patterns discovered by the 671B parameter DeepSeek-R1 model, offering enhanced problem-solving abilities in a compact size.
- Context Length: Supports a substantial context length of 131,072 tokens, enabling processing of extensive inputs.
- Efficiency: Provides a powerful reasoning engine in a 1.5B parameter model, making it suitable for resource-constrained environments.
Usage Recommendations
- Prompting: Avoid system prompts; integrate all instructions within the user prompt. For mathematical problems, include a directive like "Please reason step by step, and put your final answer within \boxed{}".
- Temperature: Recommended temperature setting is 0.5-0.7 (0.6 is ideal) to prevent repetitive or incoherent outputs.
- Enforced Reasoning: To ensure thorough reasoning, it is recommended to enforce the model to start its response with "\n".