Name: allenai/tulu-v2.5-dpo-13b-helpsteer API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: allenai

Model Overview

allenai/tulu-v2.5-dpo-13b-helpsteer is a 13 billion parameter language model developed by AllenAI, building upon the Tulu V2 suite. It is fine-tuned from meta-llama/Llama-2-13b-hf and specifically aligned using Direct Preference Optimization (DPO) on the HelpSteer dataset. This model is part of a collection of RLHF-tuned chat models designed to act as helpful assistants.

Key Characteristics

Base Model: Fine-tuned from Llama-2-13b-hf.
Alignment Method: Utilizes DPO (Direct Preference Optimization) for alignment, as detailed in the paper "Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback".
Training Data: Initially fine-tuned on a mix of publicly available, synthetic, and human-created datasets, then further aligned on the helpsteer split of the Tulu 2.5 preference data.
Input Format: Expects a specific chat format: <|user|> Your message here! <|assistant|> with a crucial newline after <|assistant|> for optimal generation quality.

Intended Uses

This model is primarily intended for use as a helpful assistant in conversational AI applications. Developers should be aware that, like other Tulu models, it has not undergone extensive safety alignment beyond the RLHF phase, and thus may produce problematic outputs if specifically prompted to do so.

Overview

Model Overview

Key Characteristics

Intended Uses

Full Model Card (README)