ai-for-good-lab/byol-nya-12b-merged
The ai-for-good-lab/byol-nya-12b-merged is a 12 billion parameter language model developed by ai-for-good-lab, based on Google's Gemma 3 architecture. It is specifically fine-tuned for the Chichewa (nya) language, combining continual pre-training with instruction-following capabilities through model merging. This model is optimized for chat and instruction-following in Chichewa, offering strong performance on relevant benchmarks with a 32768 token context length.
Loading preview...
BYOL Chichewa 12B Merged Model
This model, developed by ai-for-good-lab, is a 12 billion parameter language model derived from Google's Gemma 3 architecture. It is specifically designed for the Chichewa (nya) language, leveraging the BYOL framework to extend LLMs to low-resource languages. The model is a result of merging two checkpoints: a continually pre-trained (CPT) version and an instruction-tuned (IT) version, back into the original Gemma 3 instruction model.
Key Capabilities
- Chichewa Language Proficiency: Highly specialized for understanding and generating text in Chichewa.
- Instruction Following: Combines language knowledge with robust instruction-following capabilities.
- Merged Architecture: Benefits from both continual pre-training and supervised fine-tuning through a merging process, enhancing overall performance.
- Gemma 3 Base: Built upon the strong foundation of the Gemma 3-12b-pt model.
Recommended Use Cases
- Chat and Conversational AI: Ideal for developing chatbots and conversational agents in Chichewa.
- Instruction-Based Tasks: Excels at tasks requiring the model to follow specific instructions in Chichewa.
- Research and Development: Suitable for researchers working on low-resource language LLMs, particularly for Chichewa.
For detailed evaluation results and further technical insights, refer to the associated paper.