chujiezheng/LLaMA3-iterative-DPO-final-ExPO
The chujiezheng/LLaMA3-iterative-DPO-final-ExPO is an 8 billion parameter LLaMA3-based language model with an 8192-token context length. Developed by chujiezheng, this model utilizes an extrapolation (ExPO) technique with alpha = 0.3, building upon RLHFlow's LLaMA3-iterative-DPO-final and LLaMA3-SFT. It is specifically designed to enhance alignment with human preferences, demonstrating improved win rates on benchmarks like AlpacaEval 2.0 and higher scores on MT-Bench compared to its base models and other LLMs.
Loading preview...
Model Overview
The chujiezheng/LLaMA3-iterative-DPO-final-ExPO is an 8 billion parameter language model derived from the LLaMA3 architecture. It is based on the RLHFlow/LLaMA3-iterative-DPO-final and RLHFlow/LLaMA3-SFT models, incorporating an "extrapolated (ExPO)" technique as described in the "Weak-to-Strong Extrapolation Expedites Alignment" paper. This method involves extrapolating from the weights of SFT and DPO/RLHF checkpoints with an alpha value of 0.3 to achieve superior alignment with human preferences.
Key Capabilities & Performance
This model demonstrates enhanced performance across several benchmarks, indicating improved alignment and response quality:
- AlpacaEval 2.0: Consistently shows higher win rates and LC Win Rates compared to its base models and other evaluated LLMs, including various Zephyr, Starling-LM, Snorkel, InternLM2-chat, and Tulu-2-dpo variants. For instance, it boosts
RLHFlow/LLaMA3-iterative-DPO-final's win rate from 29.2% to 32.7%. - MT-Bench: Achieves improved scores on the MT-Bench benchmark, with the ExPO version outperforming the original models across all tested architectures. For example,
RLHFlow/LLaMA3-iterative-DPO-final's MT-Bench score increases from 8.08 to 8.45 with ExPO.
Use Cases
This model is particularly well-suited for applications requiring high-quality, human-aligned text generation and conversational AI, where improved preference alignment leads to more satisfactory user interactions. Its enhanced performance on general conversational benchmarks suggests its utility in chatbots, content generation, and interactive AI systems.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.