The israel/AfriqueQwen-14B-Fact-full model is a 14 billion parameter language model fine-tuned from McGill-NLP/AfriqueQwen-14B. It is specifically optimized for factual tasks, leveraging the afrifact dataset for its training. This model is designed for applications requiring accurate information retrieval and generation, with a context length of 32768 tokens.
Loading preview...
Model Overview
The israel/AfriqueQwen-14B-Fact-full is a 14 billion parameter language model, fine-tuned from the McGill-NLP/AfriqueQwen-14B base model. Its primary specialization lies in factual information processing, achieved through fine-tuning on the afrifact dataset. This model is built upon the Qwen architecture and supports a substantial context length of 32768 tokens, making it suitable for handling extensive factual queries and documents.
Key Capabilities
- Factual Information Processing: Optimized for tasks requiring accurate factual recall and generation due to its specific fine-tuning on the
afrifactdataset. - Large Context Window: Benefits from a 32768-token context length, allowing for comprehensive understanding and generation based on lengthy inputs.
- Qwen Architecture: Leverages the robust capabilities of the Qwen model family.
Training Details
The model was trained using a learning rate of 1e-05, with a total batch size of 8 across 4 multi-GPU devices. The training procedure involved 3 epochs, utilizing the AdamW_TORCH_FUSED optimizer and a cosine learning rate scheduler with 0.1 warmup steps. The development environment included Transformers 5.2.0, Pytorch 2.10.0+cu128, Datasets 4.0.0, and Tokenizers 0.22.2.
Good For
- Applications requiring high accuracy in factual question answering.
- Information extraction and summarization from factual texts.
- Tasks benefiting from a large context window for factual analysis.