Zhaoxuan/PUGC-Mistral-DPO
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Zhaoxuan/PUGC-Mistral-DPO is a 7 billion parameter language model, fine-tuned from Mistral-7B-Instruct-v0.2 using Direct Preference Optimization (DPO). This model leverages implicit preferences extracted from user-generated content (UGC) to improve alignment and response quality. It is specifically designed to enhance LLM alignment by transforming UGC into preference data for scalable, domain-specific training. The model demonstrates improved performance in aligning with human preferences, particularly in scenarios where traditional curated preference data is costly.

Loading preview...