canbingol/gemma3_1B_base-tr-cpt-1epoch_stage4

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Mar 4, 2026Architecture:Transformer Warm

The canbingol/gemma3_1B_base-tr-cpt-1epoch_stage4 model is a 1 billion parameter Gemma-3-1B variant developed by canbingol, specifically optimized for Turkish language understanding through continued pretraining. This model represents the fourth stage of a sequential continued pretraining process, having been exposed to a cumulative total of 200,000 Turkish web corpus samples. It is designed for applications requiring a compact, Turkish-focused language model, building upon adaptations from prior training stages.

Loading preview...

Overview

This model, canbingol/gemma3_1B_base-tr-cpt-1epoch_stage4, is a 1 billion parameter Gemma-3-1B variant that has undergone Stage 4 of Turkish Continued Pretraining (CPT). It was initialized from canbingol/gemma3_1B_base-tr-cpt-1epoch_stage3, making it a direct continuation of previous training efforts.

Key Characteristics

  • Turkish Language Focus: Specifically adapted for the Turkish language through continued pretraining on a Turkish web corpus.
  • Sequential CPT: This model is the culmination of a four-stage sequential CPT process, where each stage trained on a disjoint shard of the dataset.
  • Cumulative Data Exposure: By the end of Stage 4, the model has been cumulatively exposed to 200,000 samples from the canbingol/vngrs-web-corpus-200k dataset.
  • Training Objective: Continued Pretraining for 1 epoch on samples 150,000–200,000, inheriting adaptations from prior stages.
  • Token Count: This stage alone processed approximately 21.6 million tokens, contributing to a cumulative total of around 86.1 million tokens across all four stages.

Training Lineage

This model's training lineage is a sequential progression:

  • Stage 0: google/gemma-3-1b-pt
  • Stage 1: Samples 0–50,000
  • Stage 2: Samples 50,000–100,000
  • Stage 3: Samples 100,000–150,000
  • Stage 4 (this model): Samples 150,000–200,000, completing the first full epoch over the 200K-sample dataset.

Use Cases

This model is suitable for applications requiring a compact, Turkish-centric language model, particularly for tasks benefiting from its domain adaptation to Turkish web content.