Name: imone/Llama-3-8B-fixed-special-embedding API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: imone

imone/Llama-3-8B-fixed-special-embedding Overview

This model is a specialized variant of the 8 billion parameter Llama 3 base model, designed to resolve a specific technical issue related to special token embeddings. The original Llama 3 8B base model had zero-initialized weights for certain special tokens, which could lead to NaN (Not a Number) gradients during training or fine-tuning processes.

Key Modifications

Special Token Re-initialization: The weights for <|eot_id|>, <|start_header_id|>, and <|end_header_id|> tokens have been re-initialized.
Weight Assignment: The new weights for these special tokens in both the embed and lm_head layers are set to the mean of all other token weights, specifically up to a mean_cutoff of 128000.
Gradient Stability: This modification aims to prevent NaN gradients, thereby improving the stability and reliability of further training or fine-tuning operations.

Good for

Developers encountering NaN gradient issues with the original Llama 3 8B base model during fine-tuning.
Ensuring more stable training runs when working with Llama 3 models that utilize these specific special tokens.
As a foundational model for custom applications where robust training behavior is critical.