ale-bay/zephyr-2b-gemma-sft

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2.6BQuant:BF16Ctx Length:8kLicense:gemmaArchitecture:Transformer Warm

ale-bay/zephyr-2b-gemma-sft is a 2.6 billion parameter language model fine-tuned from google/gemma-2b. This model was instruction-tuned using the HuggingFaceH4/deita-10k-v0-sft dataset. It is designed for general language generation tasks, demonstrating a validation loss of 1.0529 after 3 epochs of training.

Loading preview...

Overview

ale-bay/zephyr-2b-gemma-sft is a 2.6 billion parameter language model derived from Google's Gemma-2B architecture. It has been instruction-tuned on the HuggingFaceH4/deita-10k-v0-sft dataset, aiming to enhance its performance on various instruction-following tasks. The model was trained over 3 epochs, achieving a final validation loss of 1.0529.

Key Capabilities

  • Instruction Following: Fine-tuned on a supervised instruction dataset to improve response generation based on given prompts.
  • General Language Generation: Suitable for a range of text generation tasks due to its foundational Gemma architecture and instruction tuning.

Training Details

The model utilized a learning rate of 2e-05, a total batch size of 128, and an Adam optimizer. Training was conducted across 8 GPUs with a cosine learning rate scheduler and a warmup ratio of 0.1.

Intended Use Cases

  • Prototyping: Can serve as a base for further fine-tuning on more specific datasets.
  • Research: Useful for exploring the effects of instruction tuning on Gemma-2B with the deita-10k-v0-sft dataset.