Name: YuchenLi01/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs128_lr1e-07_2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: YuchenLi01

Model Overview

This model, YuchenLi01/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs128_lr1e-07_2, is a 7 billion parameter language model built upon the foundation of alignment-handbook/zephyr-7b-sft-full. It has been specifically fine-tuned using Direct Preference Optimization (DPO), a method that leverages human preference data to improve model alignment without requiring a separate reward model. The training was conducted using the TRL (Transformer Reinforcement Learning) library.

Key Capabilities

Preference Alignment: Enhanced to generate responses that better align with human preferences, thanks to DPO training.
Text Generation: Capable of various text generation tasks, producing outputs informed by preference-based learning.
Fine-tuned from Zephyr-7B-SFT-Full: Benefits from the strong base capabilities of its parent model, further refined for alignment.

Training Details

The model's training procedure involved:

Methodology: Direct Preference Optimization (DPO), as detailed in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model".
Framework: Utilized the TRL library (version 0.12.0) for the training process.
Dependencies: Built with Transformers 4.46.3, Pytorch 2.3.0, Datasets 3.1.0, and Tokenizers 0.20.3.

Good For

Applications requiring high-quality, preference-aligned text generation.
Scenarios where models need to understand and adhere to human feedback implicitly through DPO.
Developers looking for a 7B parameter model with strong conversational or instructional capabilities refined by preference learning.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)