Salesforce/LLaMA-3-8B-SFR-Iterative-DPO-R
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 9, 2024License:llama3Architecture:Transformer0.1K Warm

Salesforce/LLaMA-3-8B-SFR-Iterative-DPO-R is an 8 billion parameter instruct model developed by Salesforce, based on the LLaMA-3 architecture with an 8192 token context length. It is distinguished by its iterative DPO-based online RLHF training method, which enables it to outperform many larger open-source and some proprietary models on instruct benchmarks like Alpaca-Eval-V2, MT-Bench, and Chat-Arena-Hard. This model is optimized for general instruction following and conversational AI tasks.

Loading preview...

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p