shuoxing/llama3-8b-full-pretrain-wash-c4-0-3m-sft-bs64
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 27, 2026License:llama3Architecture:Transformer Cold
The shuoxing/llama3-8b-full-pretrain-wash-c4-0-3m-sft-bs64 model is an 8 billion parameter language model, fine-tuned from shuoxing/llama3-8b-full-pretrain-wash-c4-0-3m-bs4 on the alpaca_en dataset. This model is designed for general language understanding and generation tasks, leveraging a Llama 3 base architecture. It was trained with a learning rate of 1e-05 and a total batch size of 64 over 3 epochs, making it suitable for applications requiring a moderately sized, instruction-tuned LLM.
Loading preview...