allenai/tulu-2-dpo-70b
allenai/tulu-2-dpo-70b is a 69 billion parameter language model developed by Allen Institute for AI (AI2). This model is a fine-tuned version of Llama 2, optimized using Direct Preference Optimization (DPO) on a mix of public, synthetic, and human datasets. It functions as a helpful assistant, offering a strong alternative to Llama 2 70b Chat, and excels in conversational AI tasks.
Loading preview...
Overview
allenai/tulu-2-dpo-70b is a 69 billion parameter instruction-tuned chat model developed by Allen Institute for AI (AI2). It is a fine-tuned version of meta-llama/Llama-2-70b-hf, trained using Direct Preference Optimization (DPO) on a diverse mix of publicly available, synthetic, and human-created datasets. This model is designed to act as a helpful assistant and is presented as a strong alternative to Llama 2 70b Chat.
Key Capabilities & Performance
- Architecture: Fine-tuned Llama 2 70B model.
- Training Method: Utilizes Direct Preference Optimization (DPO) for alignment, building on a recipe similar to Zephyr Beta.
- Dataset: Initial fine-tuning on the Tulu V2 mix dataset (human-created instructions and synthetic dialogues), followed by DPO alignment on the
openbmb/UltraFeedbackdataset (64k prompts ranked by GPT-4). - Performance: Achieves a high MT-Bench score of 7.89 and an AlpacaEval win rate of 95.1%, demonstrating strong conversational and instruction-following capabilities.
Intended Uses & Limitations
- Primary Use: Designed for conversational AI and acting as a helpful assistant.
- Input Format: Optimized for a specific
userandassistantturn-based format, requiring a newline after<|assistant|>for optimal generation quality. - Bias and Risks: The model has not undergone extensive safety alignment or in-the-loop filtering like ChatGPT, meaning it can produce problematic outputs, especially when explicitly prompted to do so. Users should be aware of potential biases inherited from its base model and training data.