astronomer/Llama-3-70B-Special-Tokens-Adjusted
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:8kPublished:Jul 23, 2024License:llama-3Architecture:Transformer0.0K Warm
The astronomer/Llama-3-70B-Special-Tokens-Adjusted is a 70 billion parameter Llama 3 model, developed by Astronomer, with its input and output embedding weights adjusted for previously untrained special tokens. This modification addresses potential fine-tuning instabilities like gradient explosions or NaN gradients, making it an ideal and stable base for further fine-tuning. It maintains the original Llama 3 architecture and a context length of 8192 tokens, specifically optimized to prevent issues when adding custom tokens or utilizing existing special tokens.
Loading preview...
Popular Sampler Settings
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.
temperature
–
top_p
–
top_k
–
frequency_penalty
–
presence_penalty
–
repetition_penalty
–
min_p
–