MiniLLM/VanillaKD-Pretrain-Qwen-500M
TEXT GENERATIONConcurrency Cost:1Model Size:0.6BQuant:BF16Ctx Length:32kPublished:Oct 21, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

VanillaKD-Pretrain-Qwen-500M is a 0.6 billion parameter language model developed by MiniLLM, utilizing the Qwen architecture. It was pre-trained using vanilla token-level knowledge distillation on the Pile dataset for 50 billion tokens, with Qwen1.5-1.8B serving as the teacher model. This model is specifically designed as a baseline for further MiniLLM-Qwen-500M developments and demonstrates improved performance for its computational scale.

Loading preview...