YuchenLi01/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs64_lr1e-06_4
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 10, 2025Architecture:Transformer Cold
YuchenLi01/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs64_lr1e-06_4 is a 7 billion parameter language model fine-tuned by YuchenLi01. It is based on alignment-handbook/zephyr-7b-sft-full and was trained using Direct Preference Optimization (DPO) via the TRL framework. This model is optimized for generating responses aligned with human preferences, making it suitable for conversational AI and instruction-following tasks where nuanced, preferred outputs are critical.
Loading preview...