turboderp/Qwama-0.5B-Instruct is a 0.5 billion parameter instruction-tuned causal language model based on Qwen2-0.5B-Instruct, featuring a Llama-3 vocabulary. Developed by turboderp, this model is primarily intended as a lightweight draft model for larger Llama-3-70B-Instruct for speculative decoding. It also serves as an exploration into the feasibility and cost of vocabulary swaps between dissimilar language models.
Loading preview...
turboderp/Qwama-0.5B-Instruct Overview
Qwama-0.5B-Instruct is a 0.5 billion parameter instruction-tuned model, a modified version of Qwen2-0.5B-Instruct with a Llama-3 vocabulary. Its primary purpose is to act as a lightweight draft model for speculative decoding with larger models like Llama-3-70B-Instruct, offering a less resource-intensive alternative to Llama3-8B-Instruct for this role.
Key Features and Development
- Vocabulary Swap: The model's unique characteristic is its Llama-3 vocabulary, achieved by creating a new embedding layer and initializing it based on corresponding Qwen2 token embeddings. This process involved mapping Llama-3 tokens to Qwen2 tokens, averaging embeddings for multi-token matches.
- Finetuning: After the vocabulary swap, the model underwent finetuning to restore coherence. This included training on a 2.41 million row sample from Common Crawl, followed by three epochs on approximately 25,000 instruct-formatted completions generated by Llama3-8B-Instruct.
- Performance: While the vocabulary swap initially led to some degradation (e.g., Wikitext 2k perplexity increased from 12.57 to 15.34, MMLU dropped from 43.83% to 40.37% compared to the base Qwen2-0.5B-Instruct), the model demonstrates effective speculative decoding. When used as a draft model for Llama3-70B-Instruct, it achieves 3.72x speedup for code and 1.92x for prose, outperforming Qwen2-0.5B-Instruct as a draft for Qwen2-72B-Instruct.
Use Cases
- Speculative Decoding: Ideal for accelerating inference on larger Llama-3 models by serving as a fast, lightweight draft model.
- Research: Provides a practical example for exploring the viability and challenges of vocabulary transplantation between different language models.