shuoxing/llama3-8b-full-pretrain-wash-c4-1-5m-sft-bs64

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 27, 2026Architecture:Transformer Cold

The shuoxing/llama3-8b-full-pretrain-wash-c4-1-5m-sft-bs64 is an 8 billion parameter language model, trained from scratch, likely based on the Llama 3 architecture. This model underwent supervised fine-tuning (SFT) with a batch size of 64, indicating a focus on instruction following or specific task performance. Key training hyperparameters include a learning rate of 1e-05 and 3 epochs, suggesting a targeted training approach for a specific, though currently unspecified, application.

Loading preview...

Model Overview

The shuoxing/llama3-8b-full-pretrain-wash-c4-1-5m-sft-bs64 is an 8 billion parameter language model, likely derived from the Llama 3 architecture, that has been trained from scratch. The model underwent supervised fine-tuning (SFT) with a total training batch size of 64, suggesting an optimization for specific instruction-following tasks or domain adaptation.

Training Details

The training process utilized an AdamW optimizer with a learning rate of 1e-05 over 3 epochs. Key hyperparameters included a train_batch_size of 8, gradient_accumulation_steps of 2, and a cosine learning rate scheduler with 0.1 warmup steps. The training was conducted on a multi-GPU setup with 4 devices.

Current Status

As of the current model card, specific details regarding the training dataset, intended uses, limitations, and evaluation results are not yet provided. Further information is needed to fully understand the model's capabilities and optimal applications.