The AfriqueQwen-14B-multiturn model, developed by McGill-NLP, is a 14 billion parameter language model fine-tuned from McGill-NLP/AfriqueQwen-14B. It is specifically optimized for multi-turn conversational tasks using the afri_multiturn dataset, offering a 32768 token context length. This model is designed for applications requiring nuanced, extended dialogue capabilities in African languages.
Loading preview...
AfriqueQwen-14B-multiturn Overview
This model is a specialized 14 billion parameter language model, fine-tuned by McGill-NLP from its base model, McGill-NLP/AfriqueQwen-14B. It leverages a substantial 32768 token context window, making it suitable for handling longer conversational exchanges.
Key Capabilities
- Multi-turn Dialogue: Specifically optimized for engaging in extended, multi-turn conversations.
- African Language Focus: Fine-tuned on the
afri_multiturndataset, indicating a focus on African language processing for conversational AI.
Training Details
The model was trained with a learning rate of 1e-05, using a cosine learning rate scheduler with 0.1 warmup steps over 5 epochs. It utilized an AdamW optimizer with specific beta and epsilon parameters, distributed across 4 GPUs with a total batch size of 8. The training environment included Transformers 5.2.0 and PyTorch 2.10.0+cu128.
Good For
- Developing conversational AI agents for African language contexts.
- Applications requiring models capable of understanding and generating long, multi-turn dialogues.
- Research into large language models adapted for specific linguistic and cultural datasets.