Name: allenai/tulu-v2.5-dpo-13b-alpacafarm-human-pref API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: allenai

Tulu V2.5 DPO 13B - AlpacaFarm Human Preferences

This model is a 13 billion parameter language model from AllenAI, fine-tuned from meta-llama/Llama-2-13b-hf. It belongs to the Tulu V2.5 suite, which focuses on creating helpful assistant models through advanced alignment techniques.

Key Capabilities & Training

Preference Alignment: The model is specifically trained using Direct Preference Optimization (DPO) on the alpaca_farm_human_pref dataset, aiming to align its outputs with human preferences.
Base Model: It builds upon the Tulu 2 suite, initially fine-tuned on a filtered mix of publicly available, synthetic, and human-created datasets.
Input Format: Designed to work with a specific chat template: <|user|> Your message here! <|assistant|> for optimal generation quality.
Research Focus: This model is a product of research detailed in the paper "Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback," exploring effective methods for learning from preference feedback.

Intended Uses & Limitations

This model is intended for use as a helpful assistant, particularly in scenarios where human preference alignment is crucial. However, it's important to note that the Tulu models have not undergone in-the-loop filtering for safety like some commercial models, meaning they can produce problematic outputs if prompted to do so. Users should be aware of these limitations and implement appropriate safeguards for deployment.

Overview

Tulu V2.5 DPO 13B - AlpacaFarm Human Preferences

Key Capabilities & Training

Intended Uses & Limitations

Full Model Card (README)