allenai/tulu-v2.5-dpo-13b-chatbot-arena-2024
allenai/tulu-v2.5-dpo-13b-chatbot-arena-2024 is a 13 billion parameter language model developed by AllenAI, fine-tuned from Llama-2-13b-hf. This model is specifically trained using Direct Preference Optimization (DPO) on the Chatbot Arena 2024 dataset, aiming to act as a helpful assistant. It excels in generating responses aligned with human preferences, making it suitable for conversational AI applications.
Loading preview...
Tulu V2.5 DPO 13B - Chatbot Arena 2024 Overview
This model is part of the Tulu V2.5 series by AllenAI, designed as a helpful assistant. It is a 13 billion parameter model, fine-tuned from meta-llama/Llama-2-13b-hf using Direct Preference Optimization (DPO). The training specifically leveraged the Chatbot Arena 2024 dataset, which consists of human preference data, to align its responses with user preferences. This approach builds upon the Tulu 2 suite, incorporating DPO and PPO techniques.
Key Capabilities
- Preference-aligned Generation: Trained on human preference data from Chatbot Arena 2024, it is optimized to produce responses that are preferred by users.
- Assistant-like Behavior: Designed to function as a helpful conversational assistant.
- DPO Fine-tuning: Utilizes Direct Preference Optimization for robust alignment.
Intended Uses & Limitations
The model is primarily intended for conversational AI and assistant-like applications. It was initially fine-tuned on a diverse mix of human-created instructions and synthetic dialogues. Users should be aware that, unlike some other models, Tulu V2.5 has not undergone extensive RLHF alignment for safety filtering, meaning it may produce problematic outputs, especially when explicitly prompted to do so. The model expects a specific input format: <|user|> Your message here! <|assistant|> and emphasizes including a newline after <|assistant|> for optimal generation quality.