chujiezheng/Llama-3-Instruct-8B-SimPO-ExPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 26, 2024License:llama3Architecture:Transformer0.0K Warm

The chujiezheng/Llama-3-Instruct-8B-SimPO-ExPO is an 8 billion parameter Llama-3-Instruct-based model, developed by chujiezheng, that utilizes an extrapolation (ExPO) technique to enhance alignment with human preferences. This model builds upon Llama-3-Instruct-8B-SimPO and Meta-Llama-3-8B-Instruct, achieving superior performance on benchmarks like AlpacaEval 2.0. It is specifically optimized for improved win rates in conversational AI tasks through its unique extrapolation method.

Loading preview...

Llama-3-Instruct-8B-SimPO-ExPO Overview

This model, developed by chujiezheng, is an extrapolated (ExPO) version of the Llama-3-Instruct-8B-SimPO and Meta-Llama-3-8B-Instruct models. It applies a novel extrapolation technique with an alpha value of 0.3 to the weights of the SFT and DPO/RLHF checkpoints, aiming to achieve superior alignment with human preference.

Key Enhancements & Performance

The primary differentiator of this model is its extrapolation method, which significantly boosts its performance in conversational and preference-based evaluations. On the AlpacaEval 2.0 benchmark, the ExPO model achieves a 40.6% win rate and a 45.8% LC win rate, surpassing the original Llama-3-Instruct-8B-SimPO's 40.5% and 44.7% respectively. This improvement indicates better human preference alignment. The model also shows consistent gains across various other models when the ExPO technique is applied, as demonstrated in the evaluation tables for both AlpacaEval 2.0 and MT-Bench, where the "+ ExPO" versions consistently outperform their original counterparts.

Ideal Use Cases

  • Conversational AI: Its enhanced human preference alignment makes it suitable for chatbots and interactive agents where natural and preferred responses are crucial.
  • Benchmarking and Research: Useful for researchers exploring weak-to-strong extrapolation techniques and their impact on model alignment and performance.
  • Applications requiring high win rates: For scenarios where maximizing user satisfaction and preferred outcomes in generative tasks is a priority.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p