Name: allenai/tulu-v2.5-dpo-13b-shp2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: allenai

Model Overview

allenai/tulu-v2.5-dpo-13b-shp2 is a 13 billion parameter language model developed by AllenAI, serving as a helpful assistant. It is a member of the Tulu V2.5 series, which are models fine-tuned using Direct Preference Optimization (DPO) and Proximal Policy Optimization (PPO), originating from the Tulu 2 suite. This specific iteration is trained on the SHP-2 dataset using DPO, building upon the meta-llama/Llama-2-13b-hf base model.

Key Capabilities & Training

Assistant-Oriented: Designed to act as a helpful assistant, leveraging RLHF tuning.
DPO Alignment: Utilizes Direct Preference Optimization on the shp_2 split of the allenai/tulu-2.5-preference-data dataset for alignment.
Input Format: Requires a specific chat template for optimal performance: <|user|> Your message here! <|assistant|> .
Research Focus: Developed as part of research into disentangling best practices for learning from preference feedback, detailed in the paper "Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback".

Limitations

Safety Alignment: The model has not undergone extensive safety alignment during the RLHF phase, and lacks in-the-loop filtering, meaning it can produce problematic outputs, especially when prompted to do so.

Overview

Model Overview

Key Capabilities & Training

Limitations

Full Model Card (README)