ai-for-good-lab/byol-nya-12b-cpt

VISIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:Apr 15, 2026License:gemmaArchitecture:Transformer Cold

The ai-for-good-lab/byol-nya-12b-cpt is a 12 billion parameter continually pre-trained (CPT) language model developed by ai-for-good-lab, based on Google's Gemma 3 architecture. It has been adapted for Chichewa (nya) using the BYOL framework, extending its fluency in Chichewa while retaining English capabilities. With a 32768 token context length, this base model is primarily designed for text completion tasks in both Chichewa and English.

Loading preview...

Overview

This model, byol-nya-12b-cpt, is a 12 billion parameter continually pre-trained (CPT) language model developed by ai-for-good-lab. It is built upon the google/gemma-3-12b-pt base model and has been specifically adapted for the Chichewa (nya) language. The adaptation process utilized the BYOL framework, which involved further training on a curated bilingual corpus of Chichewa and English text.

Key Capabilities

  • Bilingual Fluency: Extends the base model's knowledge and fluency in Chichewa while preserving its English language capabilities.
  • Continual Pre-Training: Represents a CPT stage, enhancing the model's understanding and generation in the target low-resource language.
  • Text Completion: As a base model (not instruction-tuned), its primary strength lies in generating coherent text completions.

Good For

  • Research and Development: Ideal for researchers exploring low-resource language adaptation and continual pre-training techniques.
  • Chichewa Language Applications: Suitable for applications requiring text generation or understanding in Chichewa.
  • Foundation for Fine-tuning: Can serve as a strong foundation for further instruction-tuning or task-specific fine-tuning for Chichewa-English bilingual tasks. For instruction-following, a merged variant is available.