jukofyork/GLM-4.5-DRAFT-0.6B-v3.0

Warm
Public
0.5B
BF16
131072
License: apache-2.0
Hugging Face
Overview

GLM-4.5-DRAFT-0.6B-v3.0: A Speculative Decoding Draft Model

The jukofyork/GLM-4.5-DRAFT-0.6B-v3.0 is a compact 0.5 billion parameter model engineered to function as a draft model for speculative decoding, primarily with the GLM-4.5, GLM-4.5-Air, and GLM-4-32B-0414 series. This model was initialized from Qwen2.5-0.5B-Instruct and underwent a specialized vocabulary transplant process using transplant-vocab to align its tokenization with the GLM-4.5 family.

Key Capabilities & Features

  • Speculative Decoding: Designed to accelerate inference when paired with larger GLM-4.5 base models.
  • Extended Context Window: Supports an impressive context length of up to 131,072 tokens, achieved through the implementation of YaRN (Yet another RoPE-scaling method) by modifying the config.json.
  • Training Data: Fine-tuned on a diverse dataset of approximately 2.3 billion tokens, including samples from agentlans/common-crawl-sample, bigcode/the-stack-smol-xl, and rombodawg/Everything_Instruct (output field only).
  • Efficient Training: Trained for one epoch using qlora-pipe-lite with a batch size of 60 and a sequence length of 32k, demonstrating efficient resource utilization.

When to Use This Model

This model is ideal for developers looking to:

  • Accelerate GLM-4.5 Inference: Utilize it as a draft model to significantly speed up the generation process of larger GLM-4.5 models.
  • Process Long Contexts: Leverage its 131k token context window for applications requiring extensive input or output lengths.
  • Experiment with Speculative Decoding: A small, specialized model for integrating into speculative decoding pipelines with compatible GLM-4.5 models.