Name: allenai/tulu-2-dpo-13b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: allenai

Tulu V2 DPO 13B: A DPO-Aligned Llama 2 Assistant

allenai/tulu-2-dpo-13b is a 13 billion parameter language model developed by AllenAI, built upon the Llama 2 architecture. It is specifically fine-tuned using Direct Preference Optimization (DPO) on a diverse mix of publicly available, synthetic, and human-created datasets, aiming to function as a highly helpful assistant.

Key Capabilities & Features

DPO Alignment: Utilizes Direct Preference Optimization for enhanced conversational quality and alignment, drawing inspiration from the Zephyr Beta model's DPO recipe.
Strong Performance: Achieves a 7.00 MT-Bench score and an 89.5% AlpacaEval win rate, positioning it as a robust alternative to Llama 2 13B Chat.
Diverse Training Data: Fine-tuned on a filtered version of the Tulu V2 mix dataset, which includes human-created instructions and synthetic dialogues, further aligned with the openbmb/UltraFeedback dataset (64k prompts ranked by GPT-4).
Instruction Following: Optimized for chat-based interactions, requiring a specific input format (<|user|>\nYour message here!\n<|assistant|>\n).

Intended Use Cases

Chatbot Development: Ideal for creating helpful and aligned conversational AI agents.
Assistant Applications: Suitable for tasks requiring detailed instruction following and natural language understanding.
Research & Development: Provides a strong base for further experimentation with DPO-aligned models.

Limitations

The model has not undergone extensive safety alignment beyond the DPO phase, meaning it may produce problematic outputs, especially when explicitly prompted to do so.
The exact composition of the base Llama 2 training corpus is unknown, which may carry inherent biases.