chujiezheng/Smaug-Llama-3-70B-Instruct-ExPO

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:8kPublished:May 19, 2024License:llama3Architecture:Transformer0.0K Warm

Smaug-Llama-3-70B-Instruct-ExPO is a 70 billion parameter instruction-tuned language model developed by chujiezheng, based on abacusai/Smaug-Llama-3-70B-Instruct and Meta-Llama-3-70B-Instruct. This model utilizes an extrapolation (ExPO) method with an alpha of 0.3 to enhance alignment with human preferences. It demonstrates improved win rates on the AlpacaEval 2.0 benchmark and higher scores on MT-Bench compared to its base models, making it suitable for applications requiring strong conversational and instruction-following capabilities.

Loading preview...

Smaug-Llama-3-70B-Instruct-ExPO Overview

This model, developed by chujiezheng, is a 70 billion parameter instruction-tuned language model derived from abacusai/Smaug-Llama-3-70B-Instruct and meta-llama/Meta-Llama-3-70B-Instruct. Its key differentiator is the application of an extrapolation (ExPO) method with an alpha value of 0.3, designed to achieve superior alignment with human preferences.

Key Enhancements & Performance

The ExPO method has shown consistent improvements across various benchmarks:

  • AlpacaEval 2.0: The model demonstrates increased win rates compared to its original base models. For instance, RLHFlow/LLaMA3-iterative-DPO-final saw its win rate improve from 29.2% to 32.7% with ExPO.
  • MT-Bench: Scores on MT-Bench also show an uplift, indicating better conversational quality and instruction following. RLHFlow/LLaMA3-iterative-DPO-final improved from 8.08 to 8.45.

These results suggest that the extrapolation technique effectively enhances the model's ability to generate preferred responses across a range of tasks.

Ideal Use Cases

This model is particularly well-suited for applications where:

  • High-quality instruction following is critical.
  • Improved alignment with human preferences is desired for conversational agents or assistants.
  • Benchmarked performance on common evaluation datasets like AlpacaEval 2.0 and MT-Bench is a priority.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p