Overview
Tulu V2 DPO 13B: A DPO-Aligned Llama 2 Assistant
allenai/tulu-2-dpo-13b is a 13 billion parameter language model developed by AllenAI, built upon the Llama 2 architecture. It is specifically fine-tuned using Direct Preference Optimization (DPO) on a diverse mix of publicly available, synthetic, and human-created datasets, aiming to function as a highly helpful assistant.
Key Capabilities & Features
- DPO Alignment: Utilizes Direct Preference Optimization for enhanced conversational quality and alignment, drawing inspiration from the Zephyr Beta model's DPO recipe.
- Strong Performance: Achieves a 7.00 MT-Bench score and an 89.5% AlpacaEval win rate, positioning it as a robust alternative to Llama 2 13B Chat.
- Diverse Training Data: Fine-tuned on a filtered version of the Tulu V2 mix dataset, which includes human-created instructions and synthetic dialogues, further aligned with the
openbmb/UltraFeedbackdataset (64k prompts ranked by GPT-4). - Instruction Following: Optimized for chat-based interactions, requiring a specific input format (
<|user|>\nYour message here!\n<|assistant|>\n).
Intended Use Cases
- Chatbot Development: Ideal for creating helpful and aligned conversational AI agents.
- Assistant Applications: Suitable for tasks requiring detailed instruction following and natural language understanding.
- Research & Development: Provides a strong base for further experimentation with DPO-aligned models.
Limitations
- The model has not undergone extensive safety alignment beyond the DPO phase, meaning it may produce problematic outputs, especially when explicitly prompted to do so.
- The exact composition of the base Llama 2 training corpus is unknown, which may carry inherent biases.