Name: wandb/gemma-7b-zephyr-dpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: wandb

Overview

wandb/gemma-7b-zephyr-dpo is an 8.5 billion parameter GPT-like model, developed by wandb, that has been fine-tuned using the Direct Preference Optimization (DPO) recipe. This DPO application was performed on top of a Supervised Fine-Tuning (SFT) version of the Gemma 7B model, specifically wandb/gemma-7b-zephyr-sft. The training process utilized the DPO script from the Hugging Face alignment-handbook recipe, with logging to Weights & Biases.

Key Capabilities & Performance

This model is primarily English-language and demonstrates strong general language understanding and reasoning abilities. Its performance on the Open LLM Leaderboard includes:

Avg. Score: 61.62
AI2 Reasoning Challenge (25-Shot): 60.84
HellaSwag (10-Shot): 80.44
MMLU (5-Shot): 60.60
TruthfulQA (0-shot): 42.48
Winogrande (5-shot): 75.37
GSM8k (5-shot): 49.96

Use Cases

Given its DPO fine-tuning and benchmark performance, this model is well-suited for applications requiring robust instruction following, conversational AI, and general text generation where preference alignment is beneficial. It can be used for tasks such as chatbots, content creation, and summarization.

Overview

Overview

Key Capabilities & Performance

Use Cases

Full Model Card (README)