TriAiExperiments/SFR-Iterative-DPO-LLaMA-3-8B-R
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 17, 2024Architecture:Transformer0.0K Warm
SFR-Iterative-DPO-LLaMA-3-8B-R is an 8 billion parameter instruct model developed by Salesforce, based on the LLaMA-3 architecture with an 8192 token context length. It utilizes an iterative DPO-based online RLHF training method, enabling it to outperform models of similar size and many larger open-source and proprietary models on instruct benchmarks like Alpaca-Eval-V2, MT-Bench, and Chat-Arena-Hard. This model is optimized for instruction following and general conversational AI tasks, achieving strong performance without relying on additional human or GPT-4 labeling.
Loading preview...
Popular Sampler Settings
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.
temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p