deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
DeepSeek-R1-Distill-Qwen-7B is a 7.6 billion parameter language model developed by DeepSeek AI, distilled from the larger DeepSeek-R1 model and based on Qwen2.5-Math-7B. It is specifically fine-tuned using reasoning data generated by DeepSeek-R1, excelling in mathematical, coding, and general reasoning tasks. This model offers strong performance in complex problem-solving, making it suitable for applications requiring robust analytical capabilities.
Loading preview...
DeepSeek-R1-Distill-Qwen-7B: Reasoning Capabilities in a Compact Model
DeepSeek-R1-Distill-Qwen-7B is a 7.6 billion parameter model from DeepSeek AI, part of the DeepSeek-R1 series. It is a distilled version of the larger DeepSeek-R1, fine-tuned on reasoning patterns generated by its predecessor, and built upon the Qwen2.5-Math-7B base model. This approach demonstrates that complex reasoning capabilities can be effectively transferred to smaller, dense models.
Key Capabilities
- Enhanced Reasoning: Benefits from distillation of advanced reasoning patterns, showing strong performance in math, code, and general reasoning benchmarks.
- Long Context Understanding: Supports a substantial context length of 131,072 tokens, enabling processing of extensive inputs.
- Performance: Achieves competitive results across various benchmarks, including AIME 2024 (55.5% pass@1), MATH-500 (92.8% pass@1), and LiveCodeBench (37.6% pass@1).
Good For
- Complex Problem Solving: Ideal for tasks requiring step-by-step reasoning, such as mathematical proofs, code generation, and logical deduction.
- Research and Development: Provides a powerful, open-source foundation for further research into model distillation and reasoning capabilities.
- Applications with Long Contexts: Suitable for use cases where processing and understanding very long documents or conversations are critical.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.