YuchenLi01/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs256_lr5e-06_0
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 1, 2025Architecture:Transformer Cold

YuchenLi01/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs256_lr5e-06_0 is a 7 billion parameter language model fine-tuned from alignment-handbook/zephyr-7b-sft-full. This model was trained using Direct Preference Optimization (DPO) with the TRL framework, enhancing its ability to align with human preferences. It is designed for text generation tasks where nuanced and preferred responses are critical, building upon the base Zephyr architecture. The model leverages a 4096 token context length for processing user inputs.

Loading preview...