imone/Llama-3-8B-fixed-special-embedding
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 21, 2024License:llama3Architecture:Transformer0.0K Warm

imone/Llama-3-8B-fixed-special-embedding is an 8 billion parameter Llama 3 model with a fixed context length of 8192 tokens. This version addresses potential NaN gradients by re-initializing the weights of specific special tokens (, , ). It is optimized for stable training and fine-tuning of Llama 3 models where these special tokens might cause issues.

Loading preview...