Name: allenai/tulu-2-dpo-70b API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: allenai

Overview

allenai/tulu-2-dpo-70b is a 69 billion parameter instruction-tuned chat model developed by Allen Institute for AI (AI2). It is a fine-tuned version of meta-llama/Llama-2-70b-hf, trained using Direct Preference Optimization (DPO) on a diverse mix of publicly available, synthetic, and human-created datasets. This model is designed to act as a helpful assistant and is presented as a strong alternative to Llama 2 70b Chat.

Key Capabilities & Performance

Architecture: Fine-tuned Llama 2 70B model.
Training Method: Utilizes Direct Preference Optimization (DPO) for alignment, building on a recipe similar to Zephyr Beta.
Dataset: Initial fine-tuning on the Tulu V2 mix dataset (human-created instructions and synthetic dialogues), followed by DPO alignment on the openbmb/UltraFeedback dataset (64k prompts ranked by GPT-4).
Performance: Achieves a high MT-Bench score of 7.89 and an AlpacaEval win rate of 95.1%, demonstrating strong conversational and instruction-following capabilities.

Intended Uses & Limitations

Primary Use: Designed for conversational AI and acting as a helpful assistant.
Input Format: Optimized for a specific user and assistant turn-based format, requiring a newline after <|assistant|> for optimal generation quality.
Bias and Risks: The model has not undergone extensive safety alignment or in-the-loop filtering like ChatGPT, meaning it can produce problematic outputs, especially when explicitly prompted to do so. Users should be aware of potential biases inherited from its base model and training data.

Overview

Overview

Key Capabilities & Performance

Intended Uses & Limitations

Full Model Card (README)