YuchenLi01/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs64_lr5e-06_0
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 11, 2025Architecture:Transformer Cold

YuchenLi01/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs64_lr5e-06_0 is a 7 billion parameter language model fine-tuned from alignment-handbook/zephyr-7b-sft-full. It was trained using Direct Preference Optimization (DPO) with the TRL framework, specializing in generating responses aligned with human preferences. This model is designed for conversational AI and instruction-following tasks, offering improved response quality through preference-based learning.

Loading preview...