shuoxing/llama3-8b-full-pretrain-wash-c4-2-7m-bs4
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 27, 2026License:llama3Architecture:Transformer Cold

The shuoxing/llama3-8b-full-pretrain-wash-c4-2-7m-bs4 model is an 8 billion parameter language model, fine-tuned from shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8. This model was trained with a learning rate of 1e-05 and a cosine learning rate scheduler over 3 epochs. Its specific differentiators and primary use cases are not detailed in the available information, suggesting it may be an experimental or intermediate pre-training checkpoint.

Loading preview...