YuchenLi01/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs128_lr1e-07_3
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 10, 2025Architecture:Transformer Cold

YuchenLi01/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs128_lr1e-07_3 is a 7 billion parameter language model fine-tuned by YuchenLi01, based on alignment-handbook/zephyr-7b-sft-full. It was trained using Direct Preference Optimization (DPO) with TRL, a method that leverages a language model as a reward model. This model is designed for generating high-quality, preference-aligned text, making it suitable for conversational AI and instruction-following tasks.

Loading preview...