Name: canbingol/gemma3_1B_base-tr-cpt-2nd_epoch_stage1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: canbingol

Overview

This model, canbingol/gemma3_1B_base-tr-cpt-2nd_epoch_stage1, is a 1 billion parameter Gemma-3-1B variant that has undergone second-epoch continued pretraining (CPT) specifically for the Turkish language. It is initialized from the checkpoint of the completed first epoch (canbingol/gemma3_1B_base-tr-cpt-1epoch_stage4), indicating a refinement and further adaptation process rather than initial training.

Key Characteristics

Architecture: Based on the Gemma-3-1B model.
Language Focus: Optimized for Turkish through continued pretraining.
Training Data: Trained on samples 0-50,000 of the canbingol/vngrs-web-corpus-200k Turkish web corpus during this stage.
Training Method: Sequential CPT across disjoint data shards, with this stage representing the beginning of the second epoch's pass over the initial data subset.
Cumulative Exposure: Approximately 107.7 million tokens after this stage, building on 86.1 million tokens from the first epoch.

Intended Use Cases

Turkish Language Generation: Ideal for tasks requiring text generation in Turkish.
Turkish NLP Applications: Suitable for various natural language processing tasks where strong Turkish language understanding is beneficial.
Further Adaptation: Serves as a strong base for additional fine-tuning on specific Turkish datasets or tasks.

Overview

Overview

Key Characteristics

Intended Use Cases

Full Model Card (README)