MiniLLM/MiniPLM-Qwen-500M is a 0.6 billion parameter language model based on the Qwen architecture, pre-trained from scratch using the MiniPLM knowledge distillation framework. Developed by MiniLLM, it leverages the official Qwen1.5-1.8B as a teacher model to efficiently train smaller LMs. This model is optimized for improved performance and language modeling capabilities on 9 downstream tasks, demonstrating effective knowledge transfer across model families and reduced pre-training computation.
No reviews yet. Be the first to review!