CriteriaPO/llama3.2-3b-dpo-vanilla
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:May 13, 2025Architecture:Transformer Warm

CriteriaPO/llama3.2-3b-dpo-vanilla is a 3 billion parameter language model, fine-tuned from CriteriaPO/llama3.2-3b-sft-10 using Direct Preference Optimization (DPO). This model is designed to align its outputs more closely with human preferences, making it suitable for conversational AI and instruction-following tasks. Its DPO training enhances response quality and relevance compared to its base SFT model. It is particularly effective for generating coherent and preferred text in interactive applications.

Loading preview...