allenai/tulu-2-dpo-13b
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Nov 13, 2023License:otherArchitecture:Transformer0.0K Warm

allenai/tulu-2-dpo-13b is a 13 billion parameter language model developed by AllenAI, fine-tuned from Llama 2 using Direct Preference Optimization (DPO). It is designed as a helpful assistant, excelling in chat-based interactions and offering a strong alternative to Llama 2 13B Chat. This model demonstrates enhanced alignment and performance on benchmarks like MT-Bench and AlpacaEval.

Loading preview...

Tulu V2 DPO 13B: A DPO-Aligned Llama 2 Assistant

allenai/tulu-2-dpo-13b is a 13 billion parameter language model developed by AllenAI, built upon the Llama 2 architecture. It is specifically fine-tuned using Direct Preference Optimization (DPO) on a diverse mix of publicly available, synthetic, and human-created datasets, aiming to function as a highly helpful assistant.

Key Capabilities & Features

  • DPO Alignment: Utilizes Direct Preference Optimization for enhanced conversational quality and alignment, drawing inspiration from the Zephyr Beta model's DPO recipe.
  • Strong Performance: Achieves a 7.00 MT-Bench score and an 89.5% AlpacaEval win rate, positioning it as a robust alternative to Llama 2 13B Chat.
  • Diverse Training Data: Fine-tuned on a filtered version of the Tulu V2 mix dataset, which includes human-created instructions and synthetic dialogues, further aligned with the openbmb/UltraFeedback dataset (64k prompts ranked by GPT-4).
  • Instruction Following: Optimized for chat-based interactions, requiring a specific input format (<|user|>\nYour message here!\n<|assistant|>\n).

Intended Use Cases

  • Chatbot Development: Ideal for creating helpful and aligned conversational AI agents.
  • Assistant Applications: Suitable for tasks requiring detailed instruction following and natural language understanding.
  • Research & Development: Provides a strong base for further experimentation with DPO-aligned models.

Limitations

  • The model has not undergone extensive safety alignment beyond the DPO phase, meaning it may produce problematic outputs, especially when explicitly prompted to do so.
  • The exact composition of the base Llama 2 training corpus is unknown, which may carry inherent biases.
Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p