Model Overview
allenai/tulu-v2.5-dpo-13b-chatbot-arena-2023 is a 13 billion parameter language model developed by AllenAI, building upon the Tulu 2 suite. It is fine-tuned from meta-llama/Llama-2-13b-hf and specifically trained using Direct Preference Optimization (DPO) on the Chatbot Arena 2023 dataset. This model is designed to act as a helpful assistant, leveraging real-world conversational data for alignment.
Key Capabilities & Training
- Assistant-Oriented: Trained to provide helpful responses in a conversational context.
- DPO Alignment: Utilizes Direct Preference Optimization (DPO) for alignment, starting from a base fine-tuned on a mix of public, synthetic, and human-created datasets (Tulu V2 mix).
- Chatbot Arena Data: Further aligned using the Chatbot Arena 2023 dataset, which consists of real-world chatbot conversations.
- Input Format: Expects a specific chat format:
<|user|> Your message here! <|assistant|> for optimal generation quality.
Intended Uses & Limitations
This model is suitable for chat-based applications requiring an assistant-like persona. However, it's important to note that the Tulu models have not undergone extensive safety alignment (like in-the-loop filtering) in the RLHF phase, meaning they can produce problematic outputs if prompted. Users should be aware of potential biases and risks inherent in models trained on broad internet data.