Name: allenai/llama-3.1-tulu-2-dpo-8b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: allenai

Model Overview

allenai/llama-3.1-tulu-2-dpo-8b is an 8 billion parameter language model developed by AllenAI, building upon the Meta-Llama-3.1-8B architecture. It is part of the Tulu series, designed to function as a helpful assistant.

Key Characteristics & Training

This model undergoes a two-stage fine-tuning process:

Initial Fine-tuning: Trained on a diverse blend of publicly available, synthetic, and human-created datasets, as detailed in the paper "Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2".
DPO Alignment: Further aligned using Direct Preference Optimization (DPO) on the UltraFeedback dataset, which contains 64k prompts and GPT-4 ranked model completions. This DPO step aims to improve instruction following and helpfulness.

Performance Highlights

Compared to the base Llama 3.1 8B Instruct model, this DPO-tuned version shows notable improvements in specific benchmarks:

TruthfulQA: Achieves 70.3% (%Info+True), significantly higher than Llama 3.1 8B Instruct's 31.1%.
IFEval (loose acc): Scores 52.3%, an improvement over Llama 3.1 8B Instruct's 75.6% (Note: The table shows a decrease in IFEval compared to the instruct model, but an increase compared to the non-DPO Tulu 2 Llama 3.1 8b).
Codex HumanEval Pass@10: Reaches 69.1%.

Intended Use

Overview

Model Overview

Key Characteristics & Training

Performance Highlights

Intended Use

Full Model Card (README)