Name: chujiezheng/Smaug-34B-v0.1-ExPO API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: chujiezheng

Smaug-34B-v0.1-ExPO Overview

Smaug-34B-v0.1-ExPO is a 34 billion parameter language model developed by chujiezheng, built upon the foundations of abacusai/Smaug-34B-v0.1 and jondurbin/bagel-34b-v0.2. Its core differentiator is the application of an "extrapolated (ExPO)" method, as described in the "Weak-to-Strong Extrapolation Expedites Alignment" paper. This technique, applied with an alpha value of 0.3, aims to achieve superior alignment with human preferences by extrapolating from the weights of existing SFT and DPO/RLHF checkpoints.

Key Capabilities & Performance

The model's effectiveness is primarily demonstrated through its improved performance on two key benchmarks:

AlpacaEval 2.0: Smaug-34B-v0.1-ExPO consistently shows an increase in win rates and LC win rates across various base models when compared to their original versions. For instance, it boosts internlm/internlm2-chat-20b's win rate from 36.1% to 46.2%.
MT-Bench: Similarly, the model exhibits higher scores on MT-Bench, indicating enhanced conversational abilities. For example, RLHFlow/LLaMA3-iterative-DPO-final improves from 8.08 to 8.45 with ExPO.

These results suggest that the ExPO method successfully enhances the model's ability to generate responses that are more aligned with human judgments.

Ideal Use Cases

This model is particularly well-suited for applications where:

Human preference alignment is critical: Its ExPO training specifically targets improved alignment with human feedback.
Enhanced conversational quality is desired: The MT-Bench improvements indicate stronger performance in dialogue-based tasks.
Leveraging existing strong base models: It builds upon and enhances established models like Smaug-34B and Bagel-34B.

Overview

Smaug-34B-v0.1-ExPO Overview

Key Capabilities & Performance

Ideal Use Cases

Full Model Card (README)