tzchen07/Gemma2-2B-SFT-X8c-2ep

TEXT GENERATIONConcurrency Cost:1Model Size:2.6BQuant:BF16Ctx Length:8kPublished:May 26, 2026License:otherArchitecture:Transformer Cold

The tzchen07/Gemma2-2B-SFT-X8c-2ep model is a 2.6 billion parameter language model, fine-tuned from unsloth/gemma-2-2b-it. This model leverages a context length of 8192 tokens and is specifically adapted through supervised fine-tuning on the v1_6_plus_v1_8_plus_v1_6c dataset. It is designed for general language understanding and generation tasks, building upon the Gemma 2 architecture.

Loading preview...

Model Overview

This model, tzchen07/Gemma2-2B-SFT-X8c-2ep, is a 2.6 billion parameter language model derived from the Gemma 2 architecture. It has been specifically fine-tuned from unsloth/gemma-2-2b-it using a supervised fine-tuning (SFT) approach. The training utilized the v1_6_plus_v1_8_plus_v1_6c dataset, enhancing its capabilities for general language tasks.

Key Training Details

The fine-tuning process involved specific hyperparameters to optimize performance:

  • Learning Rate: 5e-06
  • Batch Size: A train_batch_size of 4 and eval_batch_size of 8, with a gradient_accumulation_steps of 16, resulting in a total_train_batch_size of 64.
  • Optimizer: ADAMW_TORCH with default betas and epsilon.
  • Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1 over 2 epochs.

Intended Use Cases

While specific intended uses and limitations require further information, as a fine-tuned Gemma 2-2B model, it is generally suitable for a range of natural language processing applications where a compact yet capable model is beneficial. Its training on a specific dataset suggests potential strengths in areas covered by that data, making it a candidate for tasks requiring nuanced understanding and generation based on its fine-tuning.