imone/Llama-3-8B-fixed-special-embedding
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 21, 2024License:llama3Architecture:Transformer0.0K Warm
imone/Llama-3-8B-fixed-special-embedding is an 8 billion parameter Llama 3 model with a fixed context length of 8192 tokens. This version addresses potential NaN gradients by re-initializing the weights of specific special tokens (, , ). It is optimized for stable training and fine-tuning of Llama 3 models where these special tokens might cause issues.
Loading preview...
imone/Llama-3-8B-fixed-special-embedding Overview
This model is a specialized variant of the 8 billion parameter Llama 3 base model, designed to resolve a specific technical issue related to special token embeddings. The original Llama 3 8B base model had zero-initialized weights for certain special tokens, which could lead to NaN (Not a Number) gradients during training or fine-tuning processes.
Key Modifications
- Special Token Re-initialization: The weights for
<|eot_id|>,<|start_header_id|>, and<|end_header_id|>tokens have been re-initialized. - Weight Assignment: The new weights for these special tokens in both the
embedandlm_headlayers are set to the mean of all other token weights, specifically up to amean_cutoffof 128000. - Gradient Stability: This modification aims to prevent NaN gradients, thereby improving the stability and reliability of further training or fine-tuning operations.
Good for
- Developers encountering NaN gradient issues with the original Llama 3 8B base model during fine-tuning.
- Ensuring more stable training runs when working with Llama 3 models that utilize these specific special tokens.
- As a foundational model for custom applications where robust training behavior is critical.