Name: chujiezheng/LLaMA3-iterative-DPO-final-ExPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: chujiezheng

Model Overview

The chujiezheng/LLaMA3-iterative-DPO-final-ExPO is an 8 billion parameter language model derived from the LLaMA3 architecture. It is based on the RLHFlow/LLaMA3-iterative-DPO-final and RLHFlow/LLaMA3-SFT models, incorporating an "extrapolated (ExPO)" technique as described in the "Weak-to-Strong Extrapolation Expedites Alignment" paper. This method involves extrapolating from the weights of SFT and DPO/RLHF checkpoints with an alpha value of 0.3 to achieve superior alignment with human preferences.

Key Capabilities & Performance

This model demonstrates enhanced performance across several benchmarks, indicating improved alignment and response quality:

AlpacaEval 2.0: Consistently shows higher win rates and LC Win Rates compared to its base models and other evaluated LLMs, including various Zephyr, Starling-LM, Snorkel, InternLM2-chat, and Tulu-2-dpo variants. For instance, it boosts RLHFlow/LLaMA3-iterative-DPO-final's win rate from 29.2% to 32.7%.
MT-Bench: Achieves improved scores on the MT-Bench benchmark, with the ExPO version outperforming the original models across all tested architectures. For example, RLHFlow/LLaMA3-iterative-DPO-final's MT-Bench score increases from 8.08 to 8.45 with ExPO.

Use Cases

This model is particularly well-suited for applications requiring high-quality, human-aligned text generation and conversational AI, where improved preference alignment leads to more satisfactory user interactions. Its enhanced performance on general conversational benchmarks suggests its utility in chatbots, content generation, and interactive AI systems.

Overview

Model Overview

Key Capabilities & Performance

Use Cases

Full Model Card (README)