KrisMinchev/collapse_gemma-2-2b_hs2_replace_iter1_sftsd0

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2.6BQuant:BF16Ctx Length:8kPublished:Jan 9, 2026License:gemmaArchitecture:Transformer Warm

KrisMinchev/collapse_gemma-2-2b_hs2_replace_iter1_sftsd0 is a fine-tuned Gemma-2-2b model with 2.6 billion parameters and an 8192-token context length. Developed by KrisMinchev, this model is an iteration of a fine-tuning process on an unspecified dataset, achieving a validation loss of 1.0637. It is suitable for general language generation tasks where a compact, fine-tuned Gemma 2 architecture is beneficial.

Loading preview...

Model Overview

This model, collapse_gemma-2-2b_hs2_replace_iter1_sftsd0, is a fine-tuned variant of Google's Gemma-2-2b architecture, featuring 2.6 billion parameters. It was developed by KrisMinchev through an iterative fine-tuning process, as indicated by its name. The model maintains an 8192-token context window, characteristic of the base Gemma 2 models.

Training Details

The fine-tuning was conducted over a single epoch with a learning rate of 8e-06 and a total batch size of 128 (achieved with train_batch_size=8 and gradient_accumulation_steps=16). The optimizer used was Adam with standard betas and epsilon, and a constant learning rate scheduler with a 0.05 warmup ratio. During training, the model processed approximately 5.7 million input tokens, achieving a final validation loss of 1.0637.

Key Characteristics

  • Base Architecture: Fine-tuned from google/gemma-2-2b.
  • Parameter Count: 2.6 billion parameters.
  • Context Length: Supports an 8192-token context window.
  • Training Objective: Optimized for a general language modeling task, indicated by the loss metric.

Potential Use Cases

Given its fine-tuned nature and compact size, this model could be suitable for:

  • General text generation and completion tasks.
  • Applications requiring a smaller, efficient language model based on the Gemma 2 architecture.
  • Further experimentation or fine-tuning on specific downstream tasks where the base model's characteristics are desired.