princeton-nlp/Llama-3-Instruct-8B-RDPO
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 17, 2024Architecture:Transformer Warm
Llama-3-Instruct-8B-RDPO is an 8 billion parameter instruction-tuned language model developed by princeton-nlp. This model is fine-tuned using the SimPO (Simple Preference Optimization) method, which is a reference-free reward approach for preference optimization. It is primarily designed for conversational AI and instruction following tasks, leveraging its unique optimization technique to enhance response quality.
Loading preview...
Popular Sampler Settings
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.
temperature
–
top_p
–
top_k
–
frequency_penalty
–
presence_penalty
–
repetition_penalty
–
min_p
–