Overview
This model, astronomer/Llama-3-70B-Special-Tokens-Adjusted, is a modified version of Meta's Llama-3-70B base model, developed by Astronomer. Its primary purpose is to resolve a critical issue in the original Llama 3 base model where certain special tokens had untrained embedding weights. This oversight could lead to significant instability, such as gradient explosions or NaN gradients, during subsequent fine-tuning processes.
Key Adjustments
Astronomer identified and adjusted the input and output embedding weights for these problematic tokens. The adjustment involved setting the embedding values of these untrained tokens to the mean of the trained tokens, ensuring they are properly initialized for downstream tasks. While the 70B variant of Llama 3 had less severe issues with zero-valued embeddings compared to the 8B model, this adjusted version provides a more robust and stable foundation for fine-tuning.
Why Use This Model?
- Enhanced Fine-tuning Stability: Directly addresses and resolves the issue of untrained special tokens, preventing common fine-tuning instabilities.
- Reliable Base Model: Offers a more dependable starting point for developers looking to fine-tune Llama 3-70B for specific applications or with custom token sets.
- Community-Driven Solution: Created in response to community requests to fix a known flaw in the original Llama 3 base model, ensuring broader usability.
This model is particularly beneficial for developers who plan to extend Llama 3-70B with new tokens or rely heavily on its special tokens for instruction following, providing a seamless and stable fine-tuning experience.