allenai/llama-3.1-tulu-2-dpo-8b

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Aug 9, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The allenai/llama-3.1-tulu-2-dpo-8b is an 8 billion parameter instruction-tuned language model developed by AllenAI, fine-tuned from Meta-Llama-3.1-8B. It was initially trained on a diverse mix of publicly available, synthetic, and human datasets, then further aligned using Direct Preference Optimization (DPO) on the UltraFeedback dataset. This model is designed to act as a helpful assistant, demonstrating improved performance in areas like truthfulness and instruction following compared to its base model, with a 32768 token context length.

Loading preview...

Model Overview

allenai/llama-3.1-tulu-2-dpo-8b is an 8 billion parameter language model developed by AllenAI, building upon the Meta-Llama-3.1-8B architecture. It is part of the Tulu series, designed to function as a helpful assistant.

Key Characteristics & Training

This model undergoes a two-stage fine-tuning process:

  • Initial Fine-tuning: Trained on a diverse blend of publicly available, synthetic, and human-created datasets, as detailed in the paper "Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2".
  • DPO Alignment: Further aligned using Direct Preference Optimization (DPO) on the UltraFeedback dataset, which contains 64k prompts and GPT-4 ranked model completions. This DPO step aims to improve instruction following and helpfulness.

Performance Highlights

Compared to the base Llama 3.1 8B Instruct model, this DPO-tuned version shows notable improvements in specific benchmarks:

  • TruthfulQA: Achieves 70.3% (%Info+True), significantly higher than Llama 3.1 8B Instruct's 31.1%.
  • IFEval (loose acc): Scores 52.3%, an improvement over Llama 3.1 8B Instruct's 75.6% (Note: The table shows a decrease in IFEval compared to the instruct model, but an increase compared to the non-DPO Tulu 2 Llama 3.1 8b).
  • Codex HumanEval Pass@10: Reaches 69.1%.

Intended Use

This model is intended for use as a helpful assistant, particularly in scenarios requiring robust instruction following and truthful responses. Users should format inputs using the <|user|> and <|assistant|> tags, ensuring a newline after <|assistant|> for optimal generation quality.