amd/PARD-DeepSeek-R1-Distill-Qwen-1.5B
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 17, 2025License:mitArchitecture:Transformer0.0K Open Weights Cold

The amd/PARD-DeepSeek-R1-Distill-Qwen-1.5B is a 1.5 billion parameter model developed by AMD, designed as a parallel draft model for accelerating Large Language Model (LLM) inference. It leverages the PARD (PARallel Draft Model Adaptation) method, which offers low-cost adaptation and high performance for speculative decoding. This model is notable for its generalizability, allowing a single PARD draft model to accelerate an entire family of target models without requiring retraining for each new target, unlike other speculative decoding approaches. It is specifically optimized to improve inference speed, achieving significant speedups compared to traditional autoregressive generation and other speculative decoding methods.

Loading preview...