allenai/tulu-2-dpo-13b

Warm
Public
13B
FP8
4096
License: other
Hugging Face
Overview

Tulu V2 DPO 13B: A DPO-Aligned Llama 2 Assistant

allenai/tulu-2-dpo-13b is a 13 billion parameter language model developed by AllenAI, built upon the Llama 2 architecture. It is specifically fine-tuned using Direct Preference Optimization (DPO) on a diverse mix of publicly available, synthetic, and human-created datasets, aiming to function as a highly helpful assistant.

Key Capabilities & Features

  • DPO Alignment: Utilizes Direct Preference Optimization for enhanced conversational quality and alignment, drawing inspiration from the Zephyr Beta model's DPO recipe.
  • Strong Performance: Achieves a 7.00 MT-Bench score and an 89.5% AlpacaEval win rate, positioning it as a robust alternative to Llama 2 13B Chat.
  • Diverse Training Data: Fine-tuned on a filtered version of the Tulu V2 mix dataset, which includes human-created instructions and synthetic dialogues, further aligned with the openbmb/UltraFeedback dataset (64k prompts ranked by GPT-4).
  • Instruction Following: Optimized for chat-based interactions, requiring a specific input format (<|user|>\nYour message here!\n<|assistant|>\n).

Intended Use Cases

  • Chatbot Development: Ideal for creating helpful and aligned conversational AI agents.
  • Assistant Applications: Suitable for tasks requiring detailed instruction following and natural language understanding.
  • Research & Development: Provides a strong base for further experimentation with DPO-aligned models.

Limitations

  • The model has not undergone extensive safety alignment beyond the DPO phase, meaning it may produce problematic outputs, especially when explicitly prompted to do so.
  • The exact composition of the base Llama 2 training corpus is unknown, which may carry inherent biases.