Name: ai-for-good-lab/byol-mri-1b-cpt API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ai-for-good-lab

BYOL Māori 1B CPT: Continual Pre-Training for Low-Resource Languages

This model, developed by ai-for-good-lab using the BYOL framework, is a continually pre-trained (CPT) language model adapted for Māori (mri). It is based on the google/gemma-3-1b-pt architecture, featuring 1 billion parameters and a 32768 token context length.

Key Capabilities

Māori Language Extension: Significantly enhances the base Gemma model's proficiency and fluency in Māori through training on a curated bilingual corpus of Māori and English text.
Bilingual Support: Retains the original English language capabilities of the base model while expanding into Māori.
Continual Pre-Training: Utilizes the BYOL (Bring Your Own Language) framework to efficiently adapt large language models to low-resource languages.

Good For

Text Completion: As a base (non-instruction-tuned) model, it is best suited for generating continuations of given text prompts.
Research and Development: Ideal for researchers and developers working on natural language processing tasks in Māori or exploring methods for language adaptation in LLMs.
Low-Resource Language Applications: Provides a foundation for building applications that require understanding and generation in Māori.

Overview

BYOL Māori 1B CPT: Continual Pre-Training for Low-Resource Languages

Key Capabilities

Good For

Full Model Card (README)