astronomer/Llama-3-8B-Special-Tokens-Adjusted

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 22, 2024License:llama-3Architecture:Transformer0.0K Warm

The astronomer/Llama-3-8B-Special-Tokens-Adjusted is an 8 billion parameter Llama 3 family model developed by Astronomer, specifically David Xue. This model is a patched version of Meta's Llama-3-8B, with its input and output embedding weights adjusted to resolve issues caused by untrained special tokens. It is optimized for stable fine-tuning, preventing gradient explosions and NaN gradients that can occur with the original Llama 3 base model.

Loading preview...

Overview

This model, Llama-3-8B-Special-Tokens-Adjusted, is an 8 billion parameter variant of Meta's Llama 3 base model, developed by David Xue at Astronomer. Its primary purpose is to provide a stable foundation for fine-tuning by addressing a critical flaw in the original Llama 3 base model where certain special tokens had untrained input and output embeddings.

Key Adjustments and Benefits

Meta's original Llama 3 base model contained special tokens with zero-valued embeddings, leading to training instabilities like gradient explosions or NaN gradients during fine-tuning. This adjusted model resolves these issues by:

  • Identifying Untrained Tokens: Locating special tokens within the embedding matrices that had all-zero embedding values.
  • Calculating Mean Embeddings: Computing the average embedding values from the properly trained tokens.
  • Applying Adjustments: Replacing the zero-valued embeddings of the problematic tokens with the calculated mean embeddings.

This adjustment ensures that developers can fine-tune the Llama 3 base model without encountering common training hurdles related to these special tokens.

Ideal Use Case

This model is specifically designed for developers and researchers who intend to fine-tune the Llama 3-8B architecture. It provides a robust and stable starting point, eliminating the need for complicated pre-processing or workarounds to fix the untrained special token issue present in the original Meta Llama 3 base model.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p