nazimali/Mistral-Nemo-Kurdish

Warm
Public
12B
FP8
32768
License: apache-2.0
Hugging Face
Overview

Model Overview

nazimali/Mistral-Nemo-Kurdish is a 12 billion parameter language model that has undergone continued pre-training based on the mistralai/Mistral-Nemo-Instruct-2407 architecture. The primary goal of this pre-training was to significantly enhance its understanding of the Kurdish language.

Key Characteristics

  • Kurdish Language Focus: Pre-trained extensively on the nazimali/kurdish-wikipedia-articles dataset, comprising over 62,000 articles, to improve Kurdish language comprehension.
  • Quantized Model: Utilizes bitsandbytes for quantization, resulting in reduced memory usage, making it more accessible for deployment.
  • Base Model for Fine-tuning: This model is intended as a foundational base that requires further fine-tuning for specific downstream tasks. An instruction-tuned version, nazimali/Mistral-Nemo-Kurdish-Instruct, is also available.

Training Details

The model was trained for approximately 6.5 hours on a single NVIDIA A100 80GB PCIe GPU, achieving a training loss of 1.2108. The training process involved a custom prompt format for Wikipedia articles.

Limitations

Currently, there is a lack of standardized evaluation metrics for Kurdish language models, making direct performance comparisons challenging. The developer plans to address this by creating a reproducible evaluation benchmark for Kurdish.