Name: weqweasdas/zephyr-7b-dpo-full API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: weqweasdas

Overview

The weqweasdas/zephyr-7b-dpo-full is a 7 billion parameter language model developed by weqweasdas. It is a fine-tuned variant of the alignment-handbook/zephyr-7b-sft-full model, specifically optimized using Direct Preference Optimization (DPO).

Key Capabilities

Preference Alignment: Fine-tuned on the HuggingFaceH4/ultrafeedback_binarized dataset, this model is designed to generate responses that are preferred by humans, as indicated by its DPO training metrics.
Instruction Following: The DPO training process aims to improve the model's ability to follow instructions and produce more desirable outputs compared to rejected alternatives.
Base Model: Built upon the Zephyr-7B-SFT-Full architecture, it inherits strong foundational language understanding and generation capabilities.

Training Details

The model was trained for 2 epochs with a learning rate of 5e-07, using a total batch size of 64 across 4 GPUs. Key evaluation metrics during training include a final loss of 0.5590 and a rewards/accuracies score of 0.7857, indicating its effectiveness in distinguishing between preferred and rejected responses.

Good For

Applications requiring models that align closely with human preferences.
Conversational AI systems where response quality and desirability are critical.
Tasks involving instruction following and generating helpful, harmless, and honest outputs.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)