Name: pkupie/gemma-3-4b-kk-cpt API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: pkupie

Overview

This model, pkupie/gemma-3-4b-kk-cpt, is a 4.3 billion parameter Gemma 3 base model that has undergone continual pretraining (CPT). Its primary focus is on the Kazakh language (Arabic Script), utilizing a specific subset of the MC^2 Corpus for its training.

Key Capabilities

Enhanced Kazakh Language Modeling: Significantly improves performance for Kazakh text in Arabic script.
Low-Resource Language Adaptation: Designed to support research and development in adapting large language models to languages with limited data.
Research Base Model: Intended as a foundational checkpoint for further academic exploration, particularly in advanced techniques like model merging and logit fusion.

Training Details

The model's training methodology is detailed in the paper "Efficient Low-Resource Language Adaptation via Multi-Source Dynamic Logit Fusion" (ACL 2026). It leverages a CPT paradigm, building upon the original Gemma 3 PT 4B model.

Intended Use Cases

Academic Research: Ideal for researchers studying low-resource language processing, model adaptation, and multilingual NLP.
Base for Fine-tuning: Can serve as a strong starting point for fine-tuning on specific Kazakh language tasks.
Experimentation: Suitable for exploring novel approaches in model merging and logit fusion within a low-resource context.

Overview

Overview

Key Capabilities

Training Details

Intended Use Cases

Full Model Card (README)