princeton-nlp/Llama-3-Instruct-8B-RDPO
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 17, 2024Architecture:Transformer Warm

Llama-3-Instruct-8B-RDPO is an 8 billion parameter instruction-tuned language model developed by princeton-nlp. This model is fine-tuned using the SimPO (Simple Preference Optimization) method, which is a reference-free reward approach for preference optimization. It is primarily designed for conversational AI and instruction following tasks, leveraging its unique optimization technique to enhance response quality.

Loading preview...