ciskoM/wolof-qwen-1.5b
The ciskoM/wolof-qwen-1.5b is a 1.5 billion parameter Qwen2.5-Instruct model fine-tuned for the Wolof language. It specializes in conversational chat and translation between Wolof, French, and English, leveraging a 32768 token context length. This model is particularly optimized for low-resource language modeling and translation tasks involving Wolof. It was trained using QLoRA (4-bit) on a single GPU, making it efficient for specific linguistic applications.
Loading preview...
Wolof Qwen2.5-1.5B: Multilingual Chat and Translation
The ciskoM/wolof-qwen-1.5b model is a specialized fine-tune of the Qwen2.5-1.5B-Instruct architecture, designed to facilitate communication in the Wolof language. With 1.5 billion parameters and a 32768 token context length, its primary strength lies in translation between Wolof, French, and English, alongside basic conversational capabilities.
Key Capabilities
- Bidirectional Translation: Proficiently translates text between Wolof, French, and English.
- Conversational AI: Can engage in chat conversations, particularly when focused on translation tasks.
- Low-Resource Language Research: Serves as a valuable tool for experiments and research in Wolof language modeling.
- Efficient Training: Developed using Unsloth (QLoRA, 4-bit) on a single GPU, demonstrating efficient fine-tuning.
Training Details
The model was trained on approximately 120,000 instruction pairs in ShareGPT format, balanced across various translation directions and including monolingual Wolof data. The dataset incorporates diverse sources such as aligned sentence pairs from bilalfaye/english-wolof-french-translation and galsenai/centralized_wolof_french_translation_data, OCR'd Wolof e-books, and religious texts. The training procedure involved 1 epoch with a learning rate of 2e-4, focusing on assistant responses only.
Limitations
It's important to note that due to the dataset composition, the model is stronger at translation than at free-flowing Wolof conversation. Users should expect potential errors, especially with rare topics, long inputs, or variations in Wolof orthography. The model is not safety-tuned and its training data provenance is mixed, leading to a CC-BY-NC-4.0 license recommendation; users should verify underlying dataset licenses for commercial use.