wandb/gemma-7b-zephyr-dpo

TEXT GENERATIONConcurrency Cost:1Model Size:8.5BQuant:FP8Ctx Length:8kPublished:Feb 28, 2024License:gemma-terms-of-useArchitecture:Transformer0.0K Cold

wandb/gemma-7b-zephyr-dpo is an 8.5 billion parameter GPT-like language model, fine-tuned using the DPO (Direct Preference Optimization) recipe on top of a SFT (Supervised Fine-Tuning) Gemma 7B base. Developed by wandb, this model is primarily English-language and excels in general reasoning and language understanding tasks, achieving an average score of 61.62 on the Open LLM Leaderboard benchmarks. It is suitable for applications requiring robust conversational AI and instruction following capabilities.

Loading preview...

Overview

wandb/gemma-7b-zephyr-dpo is an 8.5 billion parameter GPT-like model, developed by wandb, that has been fine-tuned using the Direct Preference Optimization (DPO) recipe. This DPO application was performed on top of a Supervised Fine-Tuning (SFT) version of the Gemma 7B model, specifically wandb/gemma-7b-zephyr-sft. The training process utilized the DPO script from the Hugging Face alignment-handbook recipe, with logging to Weights & Biases.

Key Capabilities & Performance

This model is primarily English-language and demonstrates strong general language understanding and reasoning abilities. Its performance on the Open LLM Leaderboard includes:

  • Avg. Score: 61.62
  • AI2 Reasoning Challenge (25-Shot): 60.84
  • HellaSwag (10-Shot): 80.44
  • MMLU (5-Shot): 60.60
  • TruthfulQA (0-shot): 42.48
  • Winogrande (5-shot): 75.37
  • GSM8k (5-shot): 49.96

Use Cases

Given its DPO fine-tuning and benchmark performance, this model is well-suited for applications requiring robust instruction following, conversational AI, and general text generation where preference alignment is beneficial. It can be used for tasks such as chatbots, content creation, and summarization.