imone/Llama-3-8B-fixed-special-embedding

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 21, 2024License:llama3Architecture:Transformer0.0K Warm

imone/Llama-3-8B-fixed-special-embedding is an 8 billion parameter Llama 3 model with a fixed context length of 8192 tokens. This version addresses potential NaN gradients by re-initializing the weights of specific special tokens (, , ). It is optimized for stable training and fine-tuning of Llama 3 models where these special tokens might cause issues.

Loading preview...

imone/Llama-3-8B-fixed-special-embedding Overview

This model is a specialized variant of the 8 billion parameter Llama 3 base model, designed to resolve a specific technical issue related to special token embeddings. The original Llama 3 8B base model had zero-initialized weights for certain special tokens, which could lead to NaN (Not a Number) gradients during training or fine-tuning processes.

Key Modifications

  • Special Token Re-initialization: The weights for <|eot_id|>, <|start_header_id|>, and <|end_header_id|> tokens have been re-initialized.
  • Weight Assignment: The new weights for these special tokens in both the embed and lm_head layers are set to the mean of all other token weights, specifically up to a mean_cutoff of 128000.
  • Gradient Stability: This modification aims to prevent NaN gradients, thereby improving the stability and reliability of further training or fine-tuning operations.

Good for

  • Developers encountering NaN gradient issues with the original Llama 3 8B base model during fine-tuning.
  • Ensuring more stable training runs when working with Llama 3 models that utilize these specific special tokens.
  • As a foundational model for custom applications where robust training behavior is critical.