chujiezheng/Smaug-Llama-3-70B-Instruct-ExPO

Warm
Public
70B
FP8
8192
License: llama3
Hugging Face
Overview

Smaug-Llama-3-70B-Instruct-ExPO Overview

This model, developed by chujiezheng, is a 70 billion parameter instruction-tuned language model derived from abacusai/Smaug-Llama-3-70B-Instruct and meta-llama/Meta-Llama-3-70B-Instruct. Its key differentiator is the application of an extrapolation (ExPO) method with an alpha value of 0.3, designed to achieve superior alignment with human preferences.

Key Enhancements & Performance

The ExPO method has shown consistent improvements across various benchmarks:

  • AlpacaEval 2.0: The model demonstrates increased win rates compared to its original base models. For instance, RLHFlow/LLaMA3-iterative-DPO-final saw its win rate improve from 29.2% to 32.7% with ExPO.
  • MT-Bench: Scores on MT-Bench also show an uplift, indicating better conversational quality and instruction following. RLHFlow/LLaMA3-iterative-DPO-final improved from 8.08 to 8.45.

These results suggest that the extrapolation technique effectively enhances the model's ability to generate preferred responses across a range of tasks.

Ideal Use Cases

This model is particularly well-suited for applications where:

  • High-quality instruction following is critical.
  • Improved alignment with human preferences is desired for conversational agents or assistants.
  • Benchmarked performance on common evaluation datasets like AlpacaEval 2.0 and MT-Bench is a priority.