Overview
Model Overview
nazimali/Mistral-Nemo-Kurdish is a 12 billion parameter language model that has undergone continued pre-training based on the mistralai/Mistral-Nemo-Instruct-2407 architecture. The primary goal of this pre-training was to significantly enhance its understanding of the Kurdish language.
Key Characteristics
- Kurdish Language Focus: Pre-trained extensively on the
nazimali/kurdish-wikipedia-articlesdataset, comprising over 62,000 articles, to improve Kurdish language comprehension. - Quantized Model: Utilizes
bitsandbytesfor quantization, resulting in reduced memory usage, making it more accessible for deployment. - Base Model for Fine-tuning: This model is intended as a foundational base that requires further fine-tuning for specific downstream tasks. An instruction-tuned version, nazimali/Mistral-Nemo-Kurdish-Instruct, is also available.
Training Details
The model was trained for approximately 6.5 hours on a single NVIDIA A100 80GB PCIe GPU, achieving a training loss of 1.2108. The training process involved a custom prompt format for Wikipedia articles.
Limitations
Currently, there is a lack of standardized evaluation metrics for Kurdish language models, making direct performance comparisons challenging. The developer plans to address this by creating a reproducible evaluation benchmark for Kurdish.