Name: astronomer/Llama-3-8B-Special-Tokens-Adjusted API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: astronomer

Overview

This model, Llama-3-8B-Special-Tokens-Adjusted, is an 8 billion parameter variant of Meta's Llama 3 base model, developed by David Xue at Astronomer. Its primary purpose is to provide a stable foundation for fine-tuning by addressing a critical flaw in the original Llama 3 base model where certain special tokens had untrained input and output embeddings.

Key Adjustments and Benefits

Meta's original Llama 3 base model contained special tokens with zero-valued embeddings, leading to training instabilities like gradient explosions or NaN gradients during fine-tuning. This adjusted model resolves these issues by:

Identifying Untrained Tokens: Locating special tokens within the embedding matrices that had all-zero embedding values.
Calculating Mean Embeddings: Computing the average embedding values from the properly trained tokens.
Applying Adjustments: Replacing the zero-valued embeddings of the problematic tokens with the calculated mean embeddings.

This adjustment ensures that developers can fine-tune the Llama 3 base model without encountering common training hurdles related to these special tokens.

Ideal Use Case

This model is specifically designed for developers and researchers who intend to fine-tune the Llama 3-8B architecture. It provides a robust and stable starting point, eliminating the need for complicated pre-processing or workarounds to fix the untrained special token issue present in the original Meta Llama 3 base model.

Overview

Overview

Key Adjustments and Benefits

Ideal Use Case

Full Model Card (README)