W-61/llama3-8b-dpo-4xh100-pilot
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 28, 2026Architecture:Transformer Cold

W-61/llama3-8b-dpo-4xh100-pilot is an 8 billion parameter language model fine-tuned from princeton-nlp/Llama-3-Base-8B-SFT. This model utilizes Direct Preference Optimization (DPO) for enhanced performance, trained with the TRL framework. It is designed for general text generation tasks, leveraging its DPO training to align with human preferences. The model supports a context length of 8192 tokens.

Loading preview...