Name: YuchenLi01/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs128_lr1e-07_0 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: YuchenLi01

Model Overview

This model, developed by YuchenLi01, is a 7 billion parameter language model derived from the alignment-handbook/zephyr-7b-sft-full base model. It has been specifically fine-tuned using Direct Preference Optimization (DPO), a method detailed in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model." This training approach aims to align the model's outputs more closely with human preferences.

Key Capabilities

Preference-aligned text generation: Enhanced to produce responses that are preferred by humans, thanks to DPO training.
General-purpose language understanding and generation: Suitable for a wide range of conversational and text completion tasks.
Built on Zephyr-7B-SFT-Full: Inherits the strong foundational capabilities of its base model.

Training Details

The model was trained using the TRL (Transformer Reinforcement Learning) framework, version 0.12.0. The DPO method leverages preference data to directly optimize the language model, bypassing the need for a separate reward model. This makes it particularly effective for improving response quality based on human feedback.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)