Model Overview
This model, cfierro/llama-3.1-8b-fft-simpleqa-ar, is a fully fine-tuned version of the meta-llama/Llama-3.1-8B-Instruct base model. It was trained by cfierro using the Axolotl framework, specifically targeting question answering in Arabic.
Key Characteristics
- Base Model: Meta Llama-3.1-8B-Instruct
- Parameter Count: 8 billion
- Context Length: 32768 tokens
- Training Data: Fine-tuned on the
cfierro/simpleqa_wiki_ar_Llama-3.1-8B-Instruct dataset, indicating a specialization in Arabic question answering. - Training Method: Full fine-tuning (no LoRA) using DeepSpeed ZeRO Stage 3 for efficient multi-GPU training.
- Hyperparameters: Trained with a learning rate of 1e-05 over 3000 steps, utilizing a constant learning rate scheduler and AdamW 8-bit optimizer.
Intended Use Cases
This model is primarily designed for:
- Arabic Question Answering: Excels at answering questions based on the knowledge acquired from its specialized Arabic dataset.
- Knowledge Retrieval: Potentially useful for extracting information from Arabic text.
Limitations
As noted in the original model card, more information is needed regarding specific intended uses and limitations. Users should evaluate its performance on their specific Arabic QA tasks.