Overview
GLM-4.5-DRAFT-0.6B-v3.0: A Speculative Decoding Draft Model
The jukofyork/GLM-4.5-DRAFT-0.6B-v3.0 is a compact 0.5 billion parameter model engineered to function as a draft model for speculative decoding, primarily with the GLM-4.5, GLM-4.5-Air, and GLM-4-32B-0414 series. This model was initialized from Qwen2.5-0.5B-Instruct and underwent a specialized vocabulary transplant process using transplant-vocab to align its tokenization with the GLM-4.5 family.
Key Capabilities & Features
- Speculative Decoding: Designed to accelerate inference when paired with larger GLM-4.5 base models.
- Extended Context Window: Supports an impressive context length of up to 131,072 tokens, achieved through the implementation of YaRN (Yet another RoPE-scaling method) by modifying the
config.json. - Training Data: Fine-tuned on a diverse dataset of approximately 2.3 billion tokens, including samples from
agentlans/common-crawl-sample,bigcode/the-stack-smol-xl, andrombodawg/Everything_Instruct(output field only). - Efficient Training: Trained for one epoch using
qlora-pipe-litewith a batch size of 60 and a sequence length of 32k, demonstrating efficient resource utilization.
When to Use This Model
This model is ideal for developers looking to:
- Accelerate GLM-4.5 Inference: Utilize it as a draft model to significantly speed up the generation process of larger GLM-4.5 models.
- Process Long Contexts: Leverage its 131k token context window for applications requiring extensive input or output lengths.
- Experiment with Speculative Decoding: A small, specialized model for integrating into speculative decoding pipelines with compatible GLM-4.5 models.