RLHFlow/LLaMA3-iterative-DPO-final
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 17, 2024License:llama3Architecture:Transformer0.0K Warm

RLHFlow/LLaMA3-iterative-DPO-final is an 8 billion parameter LLaMA3-based instruct model developed by RLHFlow, fine-tuned using an iterative DPO-based online RLHF recipe. This model significantly outperforms other models of similar size, many larger open-sourced models, and strong proprietary models like GPT-3.5-turbo-0613 on chat benchmarks such as Alpaca-Eval-V2, MT-Bench, and Chat-Arena-Hard. It is optimized for instruction following and general conversational AI tasks, demonstrating superior performance without relying on additional human or GPT-4 labeling.

Loading preview...

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p