Name: allenai/tulu-v2.5-dpo-13b-chatbot-arena-2024 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: allenai

Tulu V2.5 DPO 13B - Chatbot Arena 2024 Overview

This model is part of the Tulu V2.5 series by AllenAI, designed as a helpful assistant. It is a 13 billion parameter model, fine-tuned from meta-llama/Llama-2-13b-hf using Direct Preference Optimization (DPO). The training specifically leveraged the Chatbot Arena 2024 dataset, which consists of human preference data, to align its responses with user preferences. This approach builds upon the Tulu 2 suite, incorporating DPO and PPO techniques.

Key Capabilities

Preference-aligned Generation: Trained on human preference data from Chatbot Arena 2024, it is optimized to produce responses that are preferred by users.
Assistant-like Behavior: Designed to function as a helpful conversational assistant.
DPO Fine-tuning: Utilizes Direct Preference Optimization for robust alignment.

Intended Uses & Limitations

The model is primarily intended for conversational AI and assistant-like applications. It was initially fine-tuned on a diverse mix of human-created instructions and synthetic dialogues. Users should be aware that, unlike some other models, Tulu V2.5 has not undergone extensive RLHF alignment for safety filtering, meaning it may produce problematic outputs, especially when explicitly prompted to do so. The model expects a specific input format: <|user|> Your message here! <|assistant|> and emphasizes including a newline after <|assistant|> for optimal generation quality.

Overview

Tulu V2.5 DPO 13B - Chatbot Arena 2024 Overview

Key Capabilities

Intended Uses & Limitations

Full Model Card (README)