imone/Llama-3-8B-fixed-special-embedding is an 8 billion parameter Llama 3 model with a fixed context length of 8192 tokens. This version addresses potential NaN gradients by re-initializing the weights of specific special tokens (<|eot_id|>, <|start_header_id|>, <|end_header_id|>). It is optimized for stable training and fine-tuning of Llama 3 models where these special tokens might cause issues.
No reviews yet. Be the first to review!