israel/AfriqueQwen-14B-multiturn_2
AfriqueQwen-14B-multiturn_2 is a 14 billion parameter language model, fine-tuned from McGill-NLP/AfriqueQwen-14B. This model is specifically optimized for multi-turn conversational tasks, leveraging a 32K context length. Its primary differentiation lies in its fine-tuning on the afri_multiturn_2 dataset, suggesting a focus on African language or context-specific dialogue.
Loading preview...
Model Overview
AfriqueQwen-14B-multiturn_2 is a 14 billion parameter language model, fine-tuned from the base model McGill-NLP/AfriqueQwen-14B. This iteration has been specifically adapted for multi-turn conversational applications through fine-tuning on the afri_multiturn_2 dataset.
Key Characteristics
- Base Model: Derived from McGill-NLP/AfriqueQwen-14B.
- Parameter Count: 14 billion parameters, offering a balance between performance and computational requirements.
- Context Length: Supports a substantial context window of 32,768 tokens, enabling the model to handle longer and more complex multi-turn dialogues.
- Fine-tuning Focus: Optimized for multi-turn interactions, indicating enhanced coherence and context retention across conversational exchanges.
Training Details
The model was trained with a learning rate of 1e-05 over 5 epochs, utilizing a distributed setup across 4 GPUs. Key hyperparameters included a total batch size of 8 (with gradient accumulation steps of 2) and the AdamW_TORCH_FUSED optimizer. The training process employed a cosine learning rate scheduler with a 0.1 warmup ratio.
Potential Use Cases
- Multi-turn Dialogue Systems: Ideal for chatbots, virtual assistants, and conversational AI applications requiring sustained context.
- African Language Processing: Given its fine-tuning on the
afri_multiturn_2dataset, it may offer specialized capabilities for African language-specific conversational tasks, though further details on the dataset's linguistic scope are needed.
Limitations
The model card indicates that more information is needed regarding its specific intended uses, limitations, and the detailed nature of the training and evaluation data. Users should exercise caution and conduct thorough evaluations for specific applications.