allenai/llama-3-tulu-2-dpo-70b
Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:8kPublished:Jun 20, 2024License:apache-2.0Architecture:Transformer Open Weights Warm

allenai/llama-3-tulu-2-dpo-70b is a 70 billion parameter language model developed by AllenAI, fine-tuned from Meta's Llama 3. It is trained as a helpful assistant using a mix of public, synthetic, and human datasets, and further aligned with DPO on the UltraFeedback dataset. This model primarily supports English and is designed for general conversational AI tasks, demonstrating strong performance across various benchmarks.

Loading preview...

Model Overview

allenai/llama-3-tulu-2-dpo-70b is a 70 billion parameter language model from AllenAI, built upon Meta's Llama 3 architecture. It is part of the Tulu series, designed to function as a helpful assistant. The model underwent a two-stage training process: initial fine-tuning on a diverse mix of publicly available, synthetic, and human-created datasets, followed by further alignment using Direct Preference Optimization (DPO) on the UltraFeedback dataset.

Key Capabilities & Performance

This model is primarily English-centric and demonstrates robust performance across a range of benchmarks, including MMLU, GSM8k, BBH, and HumanEval. Notably, it achieves a 0.754 on MMLU 5-shot, 0.860 on GSM8k 8-shot cot, and 0.878 on Codex HumanEval Pass@10. The DPO training phase, utilizing the UltraFeedback dataset, aims to enhance its ability to generate preferred responses.

Intended Uses & Limitations

  • Use Cases: Ideal for conversational AI, instruction following, and general assistant-like applications.
  • Input Format: Requires a specific input format: <|user|> Your message here! <|assistant|> for optimal generation quality.
  • Limitations: The model has not undergone extensive safety alignment (like in-the-loop filtering) and may produce problematic outputs if explicitly prompted. The exact composition of the base Llama 3 training corpus is not fully disclosed.