The jukofyork/GLM-4.5-DRAFT-0.6B-v3.0 is a 0.5 billion parameter draft model, derived from Qwen2.5-0.5B-Instruct, specifically designed for speculative decoding with larger GLM-4.5 series models. It features an extended context length of up to 131,072 tokens through YaRN scaling. This model is optimized to serve as an efficient draft generator for GLM-4.5, GLM-4.5-Air, and GLM-4-32B-0414, enhancing inference speed for long-context applications.
No reviews yet. Be the first to review!