ai-for-good-lab/byol-mri-1b-cpt
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Apr 15, 2026License:gemmaArchitecture:Transformer Cold
The ai-for-good-lab/byol-mri-1b-cpt is a 1 billion parameter continually pre-trained language model, adapted from google/gemma-3-1b-pt, specifically for the Māori language. Developed using the BYOL framework, it extends the base model's fluency in Māori while retaining English capabilities. This model is primarily designed for text completion tasks in Māori and English, leveraging its 32768 token context length.
Loading preview...
BYOL Māori 1B CPT: Continual Pre-Training for Low-Resource Languages
This model, developed by ai-for-good-lab using the BYOL framework, is a continually pre-trained (CPT) language model adapted for Māori (mri). It is based on the google/gemma-3-1b-pt architecture, featuring 1 billion parameters and a 32768 token context length.
Key Capabilities
- Māori Language Extension: Significantly enhances the base Gemma model's proficiency and fluency in Māori through training on a curated bilingual corpus of Māori and English text.
- Bilingual Support: Retains the original English language capabilities of the base model while expanding into Māori.
- Continual Pre-Training: Utilizes the BYOL (Bring Your Own Language) framework to efficiently adapt large language models to low-resource languages.
Good For
- Text Completion: As a base (non-instruction-tuned) model, it is best suited for generating continuations of given text prompts.
- Research and Development: Ideal for researchers and developers working on natural language processing tasks in Māori or exploring methods for language adaptation in LLMs.
- Low-Resource Language Applications: Provides a foundation for building applications that require understanding and generation in Māori.