afrilang/llama3-8b-full-sft
afrilang/llama3-8b-full-sft is an 8 billion parameter language model fine-tuned from Meta-Llama-3-8B-Instruct. This model is specifically adapted using the afrilang_sft dataset, indicating a focus on specific language or domain tasks related to 'afrilang'. It leverages the robust Llama 3 architecture for enhanced performance in its specialized application.
Loading preview...
Model Overview
afrilang/llama3-8b-full-sft is an 8 billion parameter language model, fine-tuned from the powerful Meta-Llama-3-8B-Instruct base model. This adaptation was performed using the afrilang_sft dataset, suggesting a specialization in tasks or languages relevant to the 'afrilang' context.
Key Training Details
The model underwent supervised fine-tuning (SFT) with the following notable hyperparameters:
- Base Model:
meta-llama/Meta-Llama-3-8B-Instruct - Learning Rate:
1e-05 - Batch Size: A total training batch size of
16(with1per device and8gradient accumulation steps). - Optimizer:
ADAMW_TORCH_FUSEDwith standard betas and epsilon. - Scheduler: Cosine learning rate scheduler with a
0.1warmup ratio. - Epochs: Trained for
3.0epochs.
Intended Use Cases
While specific intended uses and limitations are not detailed in the original README, the fine-tuning on the afrilang_sft dataset implies that this model is likely optimized for:
- Language-specific tasks: Potentially for African languages or tasks requiring understanding of specific cultural or linguistic nuances.
- Instruction-following in specialized domains: Leveraging the instruction-tuned base model, it should excel at following commands within its fine-tuned domain.
Users should be aware that the full scope of its capabilities and limitations would require further evaluation, especially concerning its performance on general tasks versus its specialized domain.