Model Overview
allenai/tulu-v2.5-dpo-13b-nectar is a 13 billion parameter language model from the Tulu V2.5 series, developed by AllenAI. It is fine-tuned from meta-llama/Llama-2-13b-hf using Direct Preference Optimization (DPO) on the Nectar dataset, aiming to act as a helpful assistant. This model is part of a suite of RLHF-tuned chat models, leveraging a mix of publicly available, synthetic, and human-created datasets for its training.
Key Characteristics
- Base Model: Fine-tuned from Llama-2-13b-hf.
- Alignment Method: Utilizes DPO (Direct Preference Optimization) for alignment, as detailed in the paper "Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback".
- Training Data: Trained on the Nectar split of the
allenai/tulu-2.5-preference-data dataset, building upon an initial fine-tuning on a filtered Tulu V2 mix dataset. - Input Format: Designed to use a specific chat template:
<|user|> Your message here! <|assistant|> for optimal generation quality.
Intended Use and Limitations
This model is intended for use as a helpful assistant, primarily in English. It's important to note that, like other Tulu models, it has not undergone extensive safety alignment during the RLHF phase or in-the-loop filtering. Consequently, it may produce problematic outputs, especially when explicitly prompted to do so. Users should be aware of these limitations regarding bias and potential risks.