Overview
Model Overview
mkurman/Qwen2.5-14B-DeepSeek-R1-1M is a 14.8 billion parameter language model created by mkurman through a merge operation. This model leverages the strengths of two distinct base models: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B for enhanced reasoning and Qwen/Qwen2.5-14B-Instruct-1M for its exceptional long-context capabilities, supporting up to 131,072 tokens.
Key Capabilities
- Hybrid Performance: Combines strong reasoning abilities with the capacity to process and understand very long input contexts.
- Versatile Application: Designed for a broad range of tasks that benefit from both deep logical understanding and extensive contextual awareness.
- Qwen2.5 Architecture: Built upon the Qwen2.5 family, known for its robust performance.
- Developer-Friendly: Includes minor tokenizer adjustments for seamless integration and provides usage examples with the
transformerslibrary.
Good For
- Complex Reasoning Tasks: Ideal for scenarios requiring advanced logical inference and problem-solving over detailed information.
- Long Document Analysis: Suitable for applications like summarizing lengthy reports, legal documents, or codebases where extended context is crucial.
- Code Generation: The underlying DeepSeek model's influence suggests potential for programming-related tasks, as demonstrated by the example prompt.
- Local Deployment: GGUF files are available, enabling use with tools like LM Studio or Ollama for local inference.