Name: YuchenLi01/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs128_lr1e-06_2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: YuchenLi01

Model Overview

YuchenLi01/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs128_lr1e-06_2 is a 7 billion parameter language model built upon the alignment-handbook/zephyr-7b-sft-full base model. It has been further fine-tuned using the Direct Preference Optimization (DPO) method, as detailed in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model". This training approach aims to align the model's outputs more closely with human preferences without requiring a separate reward model.

Key Capabilities

Preference-aligned text generation: Optimized to produce responses that are preferred by humans, making it suitable for interactive applications.
Instruction following: Capable of generating coherent and relevant text based on user prompts.
Built on Zephyr-7B-SFT-Full: Leverages the strong foundational capabilities of its base model.

Training Details

The model was trained using the TRL library (version 0.12.0) with DPO. The training process utilized specific versions of key frameworks including Transformers (4.46.3), Pytorch (2.3.0), Datasets (3.1.0), and Tokenizers (0.20.3).

Good For

Developing conversational agents that require preference-aligned responses.
Applications where human-like quality and alignment are crucial.
Research into DPO and preference-based fine-tuning methods.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)