Name: pkupie/gemma-3-4b-ug-cpt API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: pkupie

Overview

This model, pkupie/gemma-3-4b-ug-cpt, is a 4.3 billion parameter Gemma 3 continual pretraining (CPT) checkpoint. It has been further pretrained on the Uyghur subset of the MC^2 Corpus, building upon the base Gemma 3 PT 4B model. The primary goal of this model is to advance Uyghur language modeling capabilities and facilitate research into the adaptation of low-resource languages.

Key Characteristics

Base Model: Gemma 3 PT 4B
Parameter Count: 4.3 billion
Context Length: 32768 tokens
Training Data: Uyghur portion of the MC^2 Corpus
Training Paradigm: Continual Pretraining (CPT)
Research Focus: Low-resource language adaptation, specifically for Uyghur.

Intended Use

This checkpoint is released primarily for research purposes. It serves as a foundational model for future work, particularly in areas such as model merging and logit fusion techniques. The methodology behind its training is detailed in the paper "Efficient Low-Resource Language Adaptation via Multi-Source Dynamic Logit Fusion" (ACL 2026).

Good For

Researchers working on Uyghur language processing.
Experiments involving continual pretraining and adaptation for low-resource languages.
Developing and testing model merging or logit fusion strategies.

Overview

Overview

Key Characteristics

Intended Use

Good For

Full Model Card (README)