astronomer/Llama-3-8B-Special-Tokens-Adjusted
The astronomer/Llama-3-8B-Special-Tokens-Adjusted is an 8 billion parameter Llama 3 family model developed by Astronomer, specifically David Xue. This model is a patched version of Meta's Llama-3-8B, with its input and output embedding weights adjusted to resolve issues caused by untrained special tokens. It is optimized for stable fine-tuning, preventing gradient explosions and NaN gradients that can occur with the original Llama 3 base model.
Loading preview...
Overview
This model, Llama-3-8B-Special-Tokens-Adjusted, is an 8 billion parameter variant of Meta's Llama 3 base model, developed by David Xue at Astronomer. Its primary purpose is to provide a stable foundation for fine-tuning by addressing a critical flaw in the original Llama 3 base model where certain special tokens had untrained input and output embeddings.
Key Adjustments and Benefits
Meta's original Llama 3 base model contained special tokens with zero-valued embeddings, leading to training instabilities like gradient explosions or NaN gradients during fine-tuning. This adjusted model resolves these issues by:
- Identifying Untrained Tokens: Locating special tokens within the embedding matrices that had all-zero embedding values.
- Calculating Mean Embeddings: Computing the average embedding values from the properly trained tokens.
- Applying Adjustments: Replacing the zero-valued embeddings of the problematic tokens with the calculated mean embeddings.
This adjustment ensures that developers can fine-tune the Llama 3 base model without encountering common training hurdles related to these special tokens.
Ideal Use Case
This model is specifically designed for developers and researchers who intend to fine-tune the Llama 3-8B architecture. It provides a robust and stable starting point, eliminating the need for complicated pre-processing or workarounds to fix the untrained special token issue present in the original Meta Llama 3 base model.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.