Name: li-muyang/zephyr-8b-dpo-full API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: li-muyang

Model Overview

li-muyang/zephyr-8b-dpo-full is an 8 billion parameter language model derived from the meta-llama/Llama-3.1-8B architecture. It has been fine-tuned using Direct Preference Optimization (DPO) on the HuggingFaceH4/ultrafeedback_binarized dataset. This training approach aims to align the model's outputs more closely with human preferences by learning from pairs of chosen and rejected responses.

Training Details

The model was trained with a learning rate of 5e-07, a batch size of 4, and a total effective batch size of 128 across 8 GPUs. The training process involved 1 epoch, utilizing an Adam optimizer and a cosine learning rate scheduler with a 0.1 warmup ratio. Evaluation metrics from the training process indicate a rewards accuracy of 0.7656, suggesting its effectiveness in distinguishing preferred responses.

Key Characteristics

Base Model: meta-llama/Llama-3.1-8B
Fine-tuning Method: Direct Preference Optimization (DPO)
Dataset: HuggingFaceH4/ultrafeedback_binarized
Parameter Count: 8 billion

Potential Use Cases

This model is particularly well-suited for applications requiring:

Preference-aligned text generation: Producing outputs that are generally favored by human evaluators.
Conversational AI: Generating more natural and helpful dialogue responses.
Instruction following: Adhering to user instructions with improved quality compared to base models.

Overview

Model Overview

Training Details

Key Characteristics

Potential Use Cases

Full Model Card (README)