chujiezheng/LLaMA3-iterative-DPO-final-ExPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 18, 2024License:llama3Architecture:Transformer0.0K Warm

The chujiezheng/LLaMA3-iterative-DPO-final-ExPO is an 8 billion parameter LLaMA3-based language model with an 8192-token context length. Developed by chujiezheng, this model utilizes an extrapolation (ExPO) technique with alpha = 0.3, building upon RLHFlow's LLaMA3-iterative-DPO-final and LLaMA3-SFT. It is specifically designed to enhance alignment with human preferences, demonstrating improved win rates on benchmarks like AlpacaEval 2.0 and higher scores on MT-Bench compared to its base models and other LLMs.

Loading preview...

Model Overview

The chujiezheng/LLaMA3-iterative-DPO-final-ExPO is an 8 billion parameter language model derived from the LLaMA3 architecture. It is based on the RLHFlow/LLaMA3-iterative-DPO-final and RLHFlow/LLaMA3-SFT models, incorporating an "extrapolated (ExPO)" technique as described in the "Weak-to-Strong Extrapolation Expedites Alignment" paper. This method involves extrapolating from the weights of SFT and DPO/RLHF checkpoints with an alpha value of 0.3 to achieve superior alignment with human preferences.

Key Capabilities & Performance

This model demonstrates enhanced performance across several benchmarks, indicating improved alignment and response quality:

  • AlpacaEval 2.0: Consistently shows higher win rates and LC Win Rates compared to its base models and other evaluated LLMs, including various Zephyr, Starling-LM, Snorkel, InternLM2-chat, and Tulu-2-dpo variants. For instance, it boosts RLHFlow/LLaMA3-iterative-DPO-final's win rate from 29.2% to 32.7%.
  • MT-Bench: Achieves improved scores on the MT-Bench benchmark, with the ExPO version outperforming the original models across all tested architectures. For example, RLHFlow/LLaMA3-iterative-DPO-final's MT-Bench score increases from 8.08 to 8.45 with ExPO.

Use Cases

This model is particularly well-suited for applications requiring high-quality, human-aligned text generation and conversational AI, where improved preference alignment leads to more satisfactory user interactions. Its enhanced performance on general conversational benchmarks suggests its utility in chatbots, content generation, and interactive AI systems.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p