YuchenLi01/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs32_lr5e-06_1
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 9, 2025Architecture:Transformer Cold
YuchenLi01/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs32_lr5e-06_1 is a 7 billion parameter language model fine-tuned from alignment-handbook/zephyr-7b-sft-full. This model was trained using Direct Preference Optimization (DPO) with the TRL framework, enhancing its ability to align with human preferences. It is designed for general text generation tasks where preference alignment is crucial, offering improved response quality based on direct human feedback.
Loading preview...