mkurman/Qwen2.5-14B-DeepSeek-R1-1M is a 14.8 billion parameter merged language model combining the reasoning capabilities of DeepSeek-R1-Distill-Qwen-14B with the extended context handling of Qwen2.5-14B-Instruct-1M. This model is designed for versatile performance, excelling in tasks requiring both strong reasoning and processing of very long contexts up to 131,072 tokens. It is particularly suited for applications demanding robust understanding and generation over extensive textual inputs.
Model Overview
mkurman/Qwen2.5-14B-DeepSeek-R1-1M is a 14.8 billion parameter language model created by mkurman through a merge operation. This model leverages the strengths of two distinct base models: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B for enhanced reasoning and Qwen/Qwen2.5-14B-Instruct-1M for its exceptional long-context capabilities, supporting up to 131,072 tokens.
Key Capabilities
- Hybrid Performance: Combines strong reasoning abilities with the capacity to process and understand very long input contexts.
- Versatile Application: Designed for a broad range of tasks that benefit from both deep logical understanding and extensive contextual awareness.
- Qwen2.5 Architecture: Built upon the Qwen2.5 family, known for its robust performance.
- Developer-Friendly: Includes minor tokenizer adjustments for seamless integration and provides usage examples with the
transformerslibrary.
Good For
- Complex Reasoning Tasks: Ideal for scenarios requiring advanced logical inference and problem-solving over detailed information.
- Long Document Analysis: Suitable for applications like summarizing lengthy reports, legal documents, or codebases where extended context is crucial.
- Code Generation: The underlying DeepSeek model's influence suggests potential for programming-related tasks, as demonstrated by the example prompt.
- Local Deployment: GGUF files are available, enabling use with tools like LM Studio or Ollama for local inference.