Name: canbingol/gemma3_1B_base-tr-cpt-2nd_epoch_stage2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: canbingol

Overview

This model, canbingol/gemma3_1B_base-tr-cpt-2nd_epoch_stage2, is a 1 billion parameter Gemma-3-1B variant focused on Turkish Continued Pretraining (CPT). It represents the second epoch, Stage 2 of a sequential training process, building upon the canbingol/gemma3_1B_base-tr-cpt-2nd_epoch_stage1 checkpoint.

Key Characteristics

Architecture: Gemma-3-1B base model.
Training Objective: Continued Pretraining (CPT) for domain adaptation to Turkish.
Initialization: Started from a prior Turkish CPT checkpoint, not the original google/gemma-3-1b-pt.
Dataset: Trained on samples 50,000 to 100,000 of the canbingol/vngrs-web-corpus-200k Turkish web corpus.
Epochs: Trained for 1 epoch on this specific data shard.
Token Exposure: This stage added approximately 21.5 million tokens, bringing the cumulative exposure to around 129.2 million tokens across all CPT stages.

Use Cases

This model is particularly suited for applications requiring a language model with enhanced understanding and generation capabilities in Turkish, benefiting from its specialized continued pretraining on a Turkish web corpus. It is part of a multi-stage, multi-epoch CPT process designed to progressively adapt the base Gemma model to the Turkish language.

Overview

Overview

Key Characteristics

Use Cases

Full Model Card (README)