allenai/tulu-v2.5-dpo-13b-nectar
allenai/tulu-v2.5-dpo-13b-nectar is a 13 billion parameter language model developed by AllenAI, fine-tuned from Llama-2-13b-hf. It is part of the Tulu V2.5 series, specifically trained using DPO (Direct Preference Optimization) on the Nectar dataset to function as a helpful assistant. This model is optimized for generating aligned, preference-based responses in English, building on a mix of publicly available, synthetic, and human-created datasets.
Loading preview...
Model Overview
allenai/tulu-v2.5-dpo-13b-nectar is a 13 billion parameter language model from the Tulu V2.5 series, developed by AllenAI. It is fine-tuned from meta-llama/Llama-2-13b-hf using Direct Preference Optimization (DPO) on the Nectar dataset, aiming to act as a helpful assistant. This model is part of a suite of RLHF-tuned chat models, leveraging a mix of publicly available, synthetic, and human-created datasets for its training.
Key Characteristics
- Base Model: Fine-tuned from Llama-2-13b-hf.
- Alignment Method: Utilizes DPO (Direct Preference Optimization) for alignment, as detailed in the paper "Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback".
- Training Data: Trained on the Nectar split of the
allenai/tulu-2.5-preference-datadataset, building upon an initial fine-tuning on a filtered Tulu V2 mix dataset. - Input Format: Designed to use a specific chat template:
<|user|> Your message here! <|assistant|>for optimal generation quality.
Intended Use and Limitations
This model is intended for use as a helpful assistant, primarily in English. It's important to note that, like other Tulu models, it has not undergone extensive safety alignment during the RLHF phase or in-the-loop filtering. Consequently, it may produce problematic outputs, especially when explicitly prompted to do so. Users should be aware of these limitations regarding bias and potential risks.