YuchenLi01/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs32_lr1e-06_3
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 10, 2025Architecture:Transformer Cold

YuchenLi01/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs32_lr1e-06_3 is a 7 billion parameter language model fine-tuned from alignment-handbook/zephyr-7b-sft-full. This model was trained using Direct Preference Optimization (DPO) with the TRL framework, enhancing its ability to align with human preferences. It is designed for general text generation tasks, particularly those benefiting from preference-based alignment.

Loading preview...