Tulu V2 DPO 7B: A Helpful Assistant Model
Tulu V2 DPO 7B, developed by AllenAI, is a 7 billion parameter language model fine-tuned from Llama 2. It is part of the Tulu series, which focuses on creating helpful assistant models. This specific version utilizes Direct Preference Optimization (DPO) on a blend of publicly available, synthetic, and human-created datasets, including the openbmb/UltraFeedback dataset for alignment.
Key Capabilities & Performance
- Enhanced Alignment: Achieves strong performance in conversational tasks, demonstrated by an AlpacaEval win rate of 85.1% and an MT-Bench score of 6.29, making it a competitive alternative to Llama 2 7b Chat.
- DPO Fine-tuning: Leverages Direct Preference Optimization for improved response quality and alignment with user preferences.
- Instruction Following: Trained on a diverse mix of human-created instructions and synthetic dialogues.
Intended Uses & Limitations
This model is primarily intended for use as a helpful assistant in English. Users should be aware that, unlike some other models, Tulu V2 DPO 7B has not undergone extensive safety alignment during its RLHF phase, meaning it may produce problematic outputs if specifically prompted. The model expects a specific input format: <|user|> followed by the message, then <|assistant|> with a newline, which is crucial for optimal generation quality.