Name: canbingol/gemma3_1B_base-tr-cpt-1epoch_stage4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: canbingol

Overview

This model, canbingol/gemma3_1B_base-tr-cpt-1epoch_stage4, is a 1 billion parameter Gemma-3-1B variant that has undergone Stage 4 of Turkish Continued Pretraining (CPT). It was initialized from canbingol/gemma3_1B_base-tr-cpt-1epoch_stage3, making it a direct continuation of previous training efforts.

Key Characteristics

Turkish Language Focus: Specifically adapted for the Turkish language through continued pretraining on a Turkish web corpus.
Sequential CPT: This model is the culmination of a four-stage sequential CPT process, where each stage trained on a disjoint shard of the dataset.
Cumulative Data Exposure: By the end of Stage 4, the model has been cumulatively exposed to 200,000 samples from the canbingol/vngrs-web-corpus-200k dataset.
Training Objective: Continued Pretraining for 1 epoch on samples 150,000–200,000, inheriting adaptations from prior stages.
Token Count: This stage alone processed approximately 21.6 million tokens, contributing to a cumulative total of around 86.1 million tokens across all four stages.

Training Lineage

This model's training lineage is a sequential progression:

Stage 0: google/gemma-3-1b-pt
Stage 1: Samples 0–50,000
Stage 2: Samples 50,000–100,000
Stage 3: Samples 100,000–150,000
Stage 4 (this model): Samples 150,000–200,000, completing the first full epoch over the 200K-sample dataset.

Use Cases

This model is suitable for applications requiring a compact, Turkish-centric language model, particularly for tasks benefiting from its domain adaptation to Turkish web content.

Overview

Overview

Key Characteristics

Training Lineage

Use Cases

Full Model Card (README)