astronomer/Llama-3-70B-Special-Tokens-Adjusted

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:8kLicense:llama-3Architecture:Transformer0.0K Warm

The astronomer/Llama-3-70B-Special-Tokens-Adjusted is a 70 billion parameter Llama 3 model, developed by Astronomer, with its input and output embedding weights adjusted for previously untrained special tokens. This modification addresses potential fine-tuning instabilities like gradient explosions or NaN gradients, making it an ideal and stable base for further fine-tuning. It maintains the original Llama 3 architecture and a context length of 8192 tokens, specifically optimized to prevent issues when adding custom tokens or utilizing existing special tokens.

Loading preview...

Overview

This model, astronomer/Llama-3-70B-Special-Tokens-Adjusted, is a modified version of Meta's Llama-3-70B base model, developed by Astronomer. Its primary purpose is to resolve a critical issue in the original Llama 3 base model where certain special tokens had untrained embedding weights. This oversight could lead to significant instability, such as gradient explosions or NaN gradients, during subsequent fine-tuning processes.

Key Adjustments

Astronomer identified and adjusted the input and output embedding weights for these problematic tokens. The adjustment involved setting the embedding values of these untrained tokens to the mean of the trained tokens, ensuring they are properly initialized for downstream tasks. While the 70B variant of Llama 3 had less severe issues with zero-valued embeddings compared to the 8B model, this adjusted version provides a more robust and stable foundation for fine-tuning.

Why Use This Model?

  • Enhanced Fine-tuning Stability: Directly addresses and resolves the issue of untrained special tokens, preventing common fine-tuning instabilities.
  • Reliable Base Model: Offers a more dependable starting point for developers looking to fine-tune Llama 3-70B for specific applications or with custom token sets.
  • Community-Driven Solution: Created in response to community requests to fix a known flaw in the original Llama 3 base model, ensuring broader usability.

This model is particularly beneficial for developers who plan to extend Llama 3-70B with new tokens or rely heavily on its special tokens for instruction following, providing a seamless and stable fine-tuning experience.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p