wandb/gemma-2b-zephyr-sft

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2.6BQuant:BF16Ctx Length:8kPublished:Feb 28, 2024License:gemma-terms-of-useArchitecture:Transformer0.0K Warm

wandb/gemma-2b-zephyr-sft is a 2.5 billion parameter GPT-like model fine-tuned by wandb, based on Google's Gemma 2B architecture. It applies the Zephyr Supervised Fine-Tuning (SFT) recipe to enhance its conversational and instruction-following capabilities. This model is primarily English-language focused and is suitable for general-purpose text generation and understanding tasks, demonstrating an average performance of 47.18 on the Open LLM Leaderboard.

Loading preview...

Model Overview

wandb/gemma-2b-zephyr-sft is a 2.5 billion parameter language model developed by wandb, building upon Google's Gemma 2B base model. It incorporates the Supervised Fine-Tuning (SFT) recipe from the Zephyr project, which involves training on a diverse mix of publicly available and synthetic datasets. This fine-tuning process aims to improve the model's ability to follow instructions and generate coherent, contextually relevant text.

Key Capabilities & Performance

This model is primarily designed for English language tasks. Its performance on the Open LLM Leaderboard indicates an average score of 47.18. Specific benchmark results include:

  • AI2 Reasoning Challenge (25-Shot): 49.74
  • HellaSwag (10-Shot): 72.38
  • MMLU (5-Shot): 41.37
  • TruthfulQA (0-shot): 34.42
  • Winogrande (5-shot): 66.93
  • GSM8k (5-shot): 18.27

Training Details

The model was trained using the alignment handbook recipe and logged to a Weights & Biases workspace. The training process was efficient, completing in approximately 2 hours on an 8xA100 80GB node provided by Lambda Labs.

Use Cases

Given its fine-tuned nature and performance metrics, this model is suitable for various general-purpose natural language processing tasks, particularly those requiring instruction following and conversational abilities. Its relatively small size (2.5B parameters) makes it a candidate for applications where computational resources are a consideration.