allenai/tulu-v2.5-dpo-13b-shp2

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jun 11, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

allenai/tulu-v2.5-dpo-13b-shp2 is a 13 billion parameter language model developed by AllenAI, fine-tuned from Meta's Llama-2-13b-hf. This model is part of the Tulu V2.5 series, specifically aligned using DPO (Direct Preference Optimization) on the SHP-2 dataset. It is designed to function as a helpful assistant, building upon the Tulu 2 suite of RLHF-tuned chat models.

Loading preview...

Model Overview

allenai/tulu-v2.5-dpo-13b-shp2 is a 13 billion parameter language model developed by AllenAI, serving as a helpful assistant. It is a member of the Tulu V2.5 series, which are models fine-tuned using Direct Preference Optimization (DPO) and Proximal Policy Optimization (PPO), originating from the Tulu 2 suite. This specific iteration is trained on the SHP-2 dataset using DPO, building upon the meta-llama/Llama-2-13b-hf base model.

Key Capabilities & Training

Limitations

  • Safety Alignment: The model has not undergone extensive safety alignment during the RLHF phase, and lacks in-the-loop filtering, meaning it can produce problematic outputs, especially when prompted to do so.