alamios/DeepSeek-R1-DRAFT-Qwen2.5-0.5B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Feb 6, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The alamios/DeepSeek-R1-DRAFT-Qwen2.5-0.5B is a 0.5 billion parameter draft model designed for speculative decoding, based on the DeepSeek-R1-Distill-Qwen-32B architecture. It is specifically optimized to accelerate generation for users running the DeepSeek-R1-Distill-Qwen-32B-Q4_K_M GGUF version on 3090/4090 GPUs, maintaining a 32768 token context length. This model focuses on improving inference speed without compromising context or quality, trained on code, math, reasoning, and general knowledge tasks.

Loading preview...

DeepSeek-R1-DRAFT-Qwen2.5-0.5B Overview

This model, developed by alamios, is a 0.5 billion parameter draft model intended for speculative decoding. It is specifically designed to work in conjunction with the deepseek-ai/DeepSeek-R1-Distill-Qwen-32B model, aiming to significantly speed up generation.

Key Capabilities & Features

  • Speculative Decoding: Functions as a draft model to accelerate the inference of larger models, particularly DeepSeek-R1-Distill-Qwen-32B-Q4_K_M GGUF.
  • Hardware Optimization: Tailored for users with 3090/4090 GPUs, enabling faster generation without sacrificing context length or model quality.
  • Context Length: Supports a substantial 32768 token context length.
  • Training Data: Trained on a diverse dataset comprising code, math, reasoning, and general knowledge tasks, with 7k unique examples over two epochs.

Ideal Use Case

This model is best utilized as a companion model for those already employing DeepSeek-R1-Distill-Qwen-32B and seeking to enhance their inference speed, especially on compatible NVIDIA GPUs. It allows for faster output generation while preserving the quality and extensive context capabilities of the larger model.