amd/PARD-Qwen2.5-0.5B
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:May 17, 2025License:mitArchitecture:Transformer Open Weights Warm

The amd/PARD-Qwen2.5-0.5B is a 0.5 billion parameter Qwen2.5-based parallel draft model developed by AMD. It is designed for accelerating Large Language Model (LLM) inference through a low-cost adaptation method, offering significant speedups compared to traditional autoregressive generation. This model is optimized for high-performance speculative decoding, enabling faster token generation across various target LLMs.

Loading preview...