allenai/tulu-v2.5-dpo-13b-chatbot-arena-2023

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jun 11, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The allenai/tulu-v2.5-dpo-13b-chatbot-arena-2023 is a 13 billion parameter language model developed by AllenAI, fine-tuned from Llama-2-13b-hf. It is part of the Tulu V2.5 series, trained using DPO (Direct Preference Optimization) on the Chatbot Arena 2023 dataset. This model is optimized to function as a helpful assistant, leveraging preference feedback for improved conversational capabilities.

Loading preview...

Model Overview

allenai/tulu-v2.5-dpo-13b-chatbot-arena-2023 is a 13 billion parameter language model developed by AllenAI, building upon the Tulu 2 suite. It is fine-tuned from meta-llama/Llama-2-13b-hf and specifically trained using Direct Preference Optimization (DPO) on the Chatbot Arena 2023 dataset. This model is designed to act as a helpful assistant, leveraging real-world conversational data for alignment.

Key Capabilities & Training

  • Assistant-Oriented: Trained to provide helpful responses in a conversational context.
  • DPO Alignment: Utilizes Direct Preference Optimization (DPO) for alignment, starting from a base fine-tuned on a mix of public, synthetic, and human-created datasets (Tulu V2 mix).
  • Chatbot Arena Data: Further aligned using the Chatbot Arena 2023 dataset, which consists of real-world chatbot conversations.
  • Input Format: Expects a specific chat format: <|user|> Your message here! <|assistant|> for optimal generation quality.

Intended Uses & Limitations

This model is suitable for chat-based applications requiring an assistant-like persona. However, it's important to note that the Tulu models have not undergone extensive safety alignment (like in-the-loop filtering) in the RLHF phase, meaning they can produce problematic outputs if prompted. Users should be aware of potential biases and risks inherent in models trained on broad internet data.