Name: allenai/tulu-v2.5-dpo-13b-nectar API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: allenai

Model Overview

allenai/tulu-v2.5-dpo-13b-nectar is a 13 billion parameter language model from the Tulu V2.5 series, developed by AllenAI. It is fine-tuned from meta-llama/Llama-2-13b-hf using Direct Preference Optimization (DPO) on the Nectar dataset, aiming to act as a helpful assistant. This model is part of a suite of RLHF-tuned chat models, leveraging a mix of publicly available, synthetic, and human-created datasets for its training.

Key Characteristics

Base Model: Fine-tuned from Llama-2-13b-hf.
Alignment Method: Utilizes DPO (Direct Preference Optimization) for alignment, as detailed in the paper "Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback".
Training Data: Trained on the Nectar split of the allenai/tulu-2.5-preference-data dataset, building upon an initial fine-tuning on a filtered Tulu V2 mix dataset.
Input Format: Designed to use a specific chat template: <|user|> Your message here! <|assistant|> for optimal generation quality.

Intended Use and Limitations

This model is intended for use as a helpful assistant, primarily in English. It's important to note that, like other Tulu models, it has not undergone extensive safety alignment during the RLHF phase or in-the-loop filtering. Consequently, it may produce problematic outputs, especially when explicitly prompted to do so. Users should be aware of these limitations regarding bias and potential risks.

Overview

Model Overview

Key Characteristics

Intended Use and Limitations

Full Model Card (README)