ai-for-good-lab/byol-nya-12b-merged

VISIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:Apr 15, 2026License:gemmaArchitecture:Transformer Cold

The ai-for-good-lab/byol-nya-12b-merged is a 12 billion parameter language model developed by ai-for-good-lab, based on Google's Gemma 3 architecture. It is specifically fine-tuned for the Chichewa (nya) language, combining continual pre-training with instruction-following capabilities through model merging. This model is optimized for chat and instruction-following in Chichewa, offering strong performance on relevant benchmarks with a 32768 token context length.

Loading preview...

BYOL Chichewa 12B Merged Model

This model, developed by ai-for-good-lab, is a 12 billion parameter language model derived from Google's Gemma 3 architecture. It is specifically designed for the Chichewa (nya) language, leveraging the BYOL framework to extend LLMs to low-resource languages. The model is a result of merging two checkpoints: a continually pre-trained (CPT) version and an instruction-tuned (IT) version, back into the original Gemma 3 instruction model.

Key Capabilities

  • Chichewa Language Proficiency: Highly specialized for understanding and generating text in Chichewa.
  • Instruction Following: Combines language knowledge with robust instruction-following capabilities.
  • Merged Architecture: Benefits from both continual pre-training and supervised fine-tuning through a merging process, enhancing overall performance.
  • Gemma 3 Base: Built upon the strong foundation of the Gemma 3-12b-pt model.

Recommended Use Cases

  • Chat and Conversational AI: Ideal for developing chatbots and conversational agents in Chichewa.
  • Instruction-Based Tasks: Excels at tasks requiring the model to follow specific instructions in Chichewa.
  • Research and Development: Suitable for researchers working on low-resource language LLMs, particularly for Chichewa.

For detailed evaluation results and further technical insights, refer to the associated paper.