shuoxing/llama3-8b-full-pretrain-wash-c4-1-5m-sft-bs64
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 27, 2026Architecture:Transformer Cold

The shuoxing/llama3-8b-full-pretrain-wash-c4-1-5m-sft-bs64 is an 8 billion parameter language model, trained from scratch, likely based on the Llama 3 architecture. This model underwent supervised fine-tuning (SFT) with a batch size of 64, indicating a focus on instruction following or specific task performance. Key training hyperparameters include a learning rate of 1e-05 and 3 epochs, suggesting a targeted training approach for a specific, though currently unspecified, application.

Loading preview...