Name: KrisMinchev/collapse_gemma-2-2b_hs2_replace_iter1_sftsd0 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: KrisMinchev

Model Overview

This model, collapse_gemma-2-2b_hs2_replace_iter1_sftsd0, is a fine-tuned variant of Google's Gemma-2-2b architecture, featuring 2.6 billion parameters. It was developed by KrisMinchev through an iterative fine-tuning process, as indicated by its name. The model maintains an 8192-token context window, characteristic of the base Gemma 2 models.

Training Details

The fine-tuning was conducted over a single epoch with a learning rate of 8e-06 and a total batch size of 128 (achieved with train_batch_size=8 and gradient_accumulation_steps=16). The optimizer used was Adam with standard betas and epsilon, and a constant learning rate scheduler with a 0.05 warmup ratio. During training, the model processed approximately 5.7 million input tokens, achieving a final validation loss of 1.0637.

Key Characteristics

Base Architecture: Fine-tuned from google/gemma-2-2b.
Parameter Count: 2.6 billion parameters.
Context Length: Supports an 8192-token context window.
Training Objective: Optimized for a general language modeling task, indicated by the loss metric.

Potential Use Cases

Given its fine-tuned nature and compact size, this model could be suitable for:

General text generation and completion tasks.
Applications requiring a smaller, efficient language model based on the Gemma 2 architecture.
Further experimentation or fine-tuning on specific downstream tasks where the base model's characteristics are desired.

Overview

Model Overview

Training Details

Key Characteristics

Potential Use Cases

Full Model Card (README)