imone/Llama-3-8B-fixed-special-embedding

Warm
Public
8B
FP8
8192
Apr 21, 2024
License: llama3
Hugging Face
Overview

imone/Llama-3-8B-fixed-special-embedding Overview

This model is a specialized variant of the 8 billion parameter Llama 3 base model, designed to resolve a specific technical issue related to special token embeddings. The original Llama 3 8B base model had zero-initialized weights for certain special tokens, which could lead to NaN (Not a Number) gradients during training or fine-tuning processes.

Key Modifications

  • Special Token Re-initialization: The weights for <|eot_id|>, <|start_header_id|>, and <|end_header_id|> tokens have been re-initialized.
  • Weight Assignment: The new weights for these special tokens in both the embed and lm_head layers are set to the mean of all other token weights, specifically up to a mean_cutoff of 128000.
  • Gradient Stability: This modification aims to prevent NaN gradients, thereby improving the stability and reliability of further training or fine-tuning operations.

Good for

  • Developers encountering NaN gradient issues with the original Llama 3 8B base model during fine-tuning.
  • Ensuring more stable training runs when working with Llama 3 models that utilize these specific special tokens.
  • As a foundational model for custom applications where robust training behavior is critical.