chujiezheng/Llama-3-Instruct-8B-SimPO-ExPO
The chujiezheng/Llama-3-Instruct-8B-SimPO-ExPO is an 8 billion parameter Llama-3-Instruct-based model, developed by chujiezheng, that utilizes an extrapolation (ExPO) technique to enhance alignment with human preferences. This model builds upon Llama-3-Instruct-8B-SimPO and Meta-Llama-3-8B-Instruct, achieving superior performance on benchmarks like AlpacaEval 2.0. It is specifically optimized for improved win rates in conversational AI tasks through its unique extrapolation method.
Loading preview...
Llama-3-Instruct-8B-SimPO-ExPO Overview
This model, developed by chujiezheng, is an extrapolated (ExPO) version of the Llama-3-Instruct-8B-SimPO and Meta-Llama-3-8B-Instruct models. It applies a novel extrapolation technique with an alpha value of 0.3 to the weights of the SFT and DPO/RLHF checkpoints, aiming to achieve superior alignment with human preference.
Key Enhancements & Performance
The primary differentiator of this model is its extrapolation method, which significantly boosts its performance in conversational and preference-based evaluations. On the AlpacaEval 2.0 benchmark, the ExPO model achieves a 40.6% win rate and a 45.8% LC win rate, surpassing the original Llama-3-Instruct-8B-SimPO's 40.5% and 44.7% respectively. This improvement indicates better human preference alignment. The model also shows consistent gains across various other models when the ExPO technique is applied, as demonstrated in the evaluation tables for both AlpacaEval 2.0 and MT-Bench, where the "+ ExPO" versions consistently outperform their original counterparts.
Ideal Use Cases
- Conversational AI: Its enhanced human preference alignment makes it suitable for chatbots and interactive agents where natural and preferred responses are crucial.
- Benchmarking and Research: Useful for researchers exploring weak-to-strong extrapolation techniques and their impact on model alignment and performance.
- Applications requiring high win rates: For scenarios where maximizing user satisfaction and preferred outcomes in generative tasks is a priority.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.