the-ai-alchemist/DeepSeek-R1-Distill-Qwen-14B-nethack

TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 23, 2026Architecture:Transformer Cold

The the-ai-alchemist/DeepSeek-R1-Distill-Qwen-14B-nethack is a 14.8 billion parameter language model, fine-tuned and converted to GGUF format by the-ai-alchemist. This model is based on the DeepSeek-R1-Distill-Qwen architecture and was optimized using Unsloth for faster training. It is designed for general language generation tasks, with specific GGUF files provided for efficient local deployment.

Loading preview...

Model Overview

the-ai-alchemist/DeepSeek-R1-Distill-Qwen-14B-nethack is a 14.8 billion parameter language model, fine-tuned and converted into the GGUF format. This model leverages the DeepSeek-R1-Distill-Qwen architecture and was developed by the-ai-alchemist.

Key Characteristics

  • Architecture: Based on the DeepSeek-R1-Distill-Qwen model family.
  • Parameter Count: Features 14.8 billion parameters, offering a balance between performance and computational requirements.
  • GGUF Format: Provided in GGUF format, making it compatible with llama.cpp and other GGUF-supporting inference engines for efficient CPU and GPU inference.
  • Optimization: The model underwent fine-tuning and GGUF conversion using Unsloth, which facilitated a 2x faster training process.
  • Context Length: Supports a context length of 32768 tokens.

Usage and Compatibility

This model is designed for use with llama-cli for text-only applications and llama-mtmd-cli for potential multimodal applications, utilizing Jinja templating. A specific GGUF file, DeepSeek-R1-Distill-Qwen-14B.Q8_0.gguf, is available for download. Adjustments were made to the model's BOS token behavior to ensure full GGUF compatibility.