Columbia-NLP/gemma-2b-zephyr-sft

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2.5BQuant:BF16Ctx Length:8kPublished:Apr 11, 2024License:gemma-terms-of-useArchitecture:Transformer0.0K Warm

Columbia-NLP/gemma-2b-zephyr-sft is a 2.5 billion parameter GPT-like model, fine-tuned by Columbia-NLP from Google's Gemma-2b using the deita-10k-v0-sft dataset. It is primarily English-language and optimized for supervised fine-tuning performance through careful hyper-parameter selection and user token masking. This model demonstrates improved performance across various benchmarks, including a 48.75 average on the OpenLLM Leaderboard and a 4.34 total on MT-Bench, making it suitable for general conversational AI tasks where a smaller, efficient model with strong SFT performance is desired.

Loading preview...

Columbia-NLP/gemma-2b-zephyr-sft Overview

This model is a 2.5 billion parameter, English-centric, GPT-like language model developed by Columbia-NLP. It is a supervised fine-tuned (SFT) version of the original google/gemma-2b base model, trained using the deita-10k-v0-sft dataset. Key to its development was the careful selection of hyperparameters and masking of user tokens during training to enhance its SFT performance.

Key Capabilities

  • Enhanced Supervised Fine-Tuning: Optimized for tasks requiring strong performance from supervised fine-tuning.
  • Competitive Benchmarking: Achieves an average score of 48.75 on the OpenLLM Leaderboard, outperforming its base model and other Gemma-2b variants in several categories, including ARC (51.80), HellaSwag (72.63), MMLU (42.20), TruthfulQA (41.96), and GSM8k (20.09).
  • Solid MT-Bench Performance: Scores a total of 4.34 on MT-Bench, with notable performance in Humanities (6.25) and Roleplay (5.55).

Good For

  • Applications requiring a compact yet capable language model for general English text generation and understanding.
  • Use cases where a model with strong SFT performance and competitive benchmark results in the 2B parameter class is beneficial.
  • Researchers and developers looking for an efficient model derived from the Gemma family with specific fine-tuning optimizations.