Overview
imone/Llama-3-8B-fixed-special-embedding Overview
This model is a specialized variant of the 8 billion parameter Llama 3 base model, designed to resolve a specific technical issue related to special token embeddings. The original Llama 3 8B base model had zero-initialized weights for certain special tokens, which could lead to NaN (Not a Number) gradients during training or fine-tuning processes.
Key Modifications
- Special Token Re-initialization: The weights for
<|eot_id|>,<|start_header_id|>, and<|end_header_id|>tokens have been re-initialized. - Weight Assignment: The new weights for these special tokens in both the
embedandlm_headlayers are set to the mean of all other token weights, specifically up to amean_cutoffof 128000. - Gradient Stability: This modification aims to prevent NaN gradients, thereby improving the stability and reliability of further training or fine-tuning operations.
Good for
- Developers encountering NaN gradient issues with the original Llama 3 8B base model during fine-tuning.
- Ensuring more stable training runs when working with Llama 3 models that utilize these specific special tokens.
- As a foundational model for custom applications where robust training behavior is critical.