shuoxing/llama3-8b-full-sft-c4-1m-en-v2
The shuoxing/llama3-8b-full-sft-c4-1m-en-v2 is an 8 billion parameter Llama 3-based causal language model, fine-tuned by shuoxing on the alpaca_en dataset. This model is a supervised fine-tuned (SFT) version, building upon a pre-trained Llama 3 variant. It is designed for general language generation tasks, leveraging its fine-tuning to enhance conversational and instruction-following capabilities.
Loading preview...
Model Overview
The shuoxing/llama3-8b-full-sft-c4-1m-en-v2 is an 8 billion parameter language model based on the Llama 3 architecture. Developed by shuoxing, this model is a supervised fine-tuned (SFT) version, specifically trained on the alpaca_en dataset. It builds upon the shuoxing/llama3-8b-full-pretrain-c4-1m-en base model, aiming to improve its instruction-following and conversational abilities.
Key Training Details
This model underwent fine-tuning with the following notable hyperparameters:
- Learning Rate: 1e-05
- Batch Size: A
train_batch_sizeof 1 andgradient_accumulation_stepsof 2 resulted in atotal_train_batch_sizeof 16 across 8 GPUs. - Optimizer: Utilized
ADAMW_TORCH_FUSEDwith standard beta values and epsilon. - Scheduler: Employs a cosine learning rate scheduler with 0.1 warmup steps.
- Epochs: Trained for 3.0 epochs.
Intended Use Cases
While specific intended uses are not detailed in the provided information, as a supervised fine-tuned model on the alpaca_en dataset, it is generally suitable for tasks requiring:
- Instruction following
- Conversational AI
- General text generation based on prompts
Users should be aware that more detailed information regarding its specific strengths and limitations is needed for comprehensive evaluation.