afrilang/llama3-8b-full-sft

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Jan 31, 2026License:otherArchitecture:Transformer Cold

afrilang/llama3-8b-full-sft is an 8 billion parameter language model fine-tuned from Meta-Llama-3-8B-Instruct. This model is specifically adapted using the afrilang_sft dataset, indicating a focus on specific language or domain tasks related to 'afrilang'. It leverages the robust Llama 3 architecture for enhanced performance in its specialized application.

Loading preview...

Model Overview

afrilang/llama3-8b-full-sft is an 8 billion parameter language model, fine-tuned from the powerful Meta-Llama-3-8B-Instruct base model. This adaptation was performed using the afrilang_sft dataset, suggesting a specialization in tasks or languages relevant to the 'afrilang' context.

Key Training Details

The model underwent supervised fine-tuning (SFT) with the following notable hyperparameters:

  • Base Model: meta-llama/Meta-Llama-3-8B-Instruct
  • Learning Rate: 1e-05
  • Batch Size: A total training batch size of 16 (with 1 per device and 8 gradient accumulation steps).
  • Optimizer: ADAMW_TORCH_FUSED with standard betas and epsilon.
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
  • Epochs: Trained for 3.0 epochs.

Intended Use Cases

While specific intended uses and limitations are not detailed in the original README, the fine-tuning on the afrilang_sft dataset implies that this model is likely optimized for:

  • Language-specific tasks: Potentially for African languages or tasks requiring understanding of specific cultural or linguistic nuances.
  • Instruction-following in specialized domains: Leveraging the instruction-tuned base model, it should excel at following commands within its fine-tuned domain.

Users should be aware that the full scope of its capabilities and limitations would require further evaluation, especially concerning its performance on general tasks versus its specialized domain.