W-61/llama-3-8b-base-margin-dpo-ultrafeedback-8xh200
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 10, 2026Architecture:Transformer Cold

W-61/llama-3-8b-base-margin-dpo-ultrafeedback-8xh200 is an 8 billion parameter language model fine-tuned by W-61. This model is a DPO-tuned variant of llama-3-8b-base-sft-ultrachat, specifically optimized using the HuggingFaceH4/ultrafeedback_binarized dataset. It focuses on improving response quality and alignment through direct preference optimization, achieving a margin DPO mean of 72.1584 on its evaluation set. This model is suitable for applications requiring refined conversational abilities and preference-aligned text generation.

Loading preview...