allenai/tulu-v2.5-dpo-13b-stackexchange

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jun 11, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The allenai/tulu-v2.5-dpo-13b-stackexchange is a 13 billion parameter language model developed by AllenAI, fine-tuned from Llama-2-13b-hf. This model is specifically trained using DPO on 500k samples from the StackExchange paired dataset, optimizing it for helpful assistant-like responses. It is part of the Tulu V2.5 series, focusing on learning from preference feedback to enhance conversational quality.

Loading preview...

Model Overview

allenai/tulu-v2.5-dpo-13b-stackexchange is a 13 billion parameter language model from AllenAI, part of the Tulu V2.5 series. It is fine-tuned from meta-llama/Llama-2-13b-hf and specifically aligned using Direct Preference Optimization (DPO) on a substantial dataset of 500,000 samples from the StackExchange paired dataset. This training methodology aims to produce a model that acts as a helpful assistant, learning from preference feedback as detailed in the paper "Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback".

Key Capabilities

  • Helpful Assistant: Designed to generate helpful and assistant-like responses through DPO training.
  • StackExchange Data Alignment: Specialized training on StackExchange data enhances its ability to handle technical questions and provide informative answers.
  • Preference Learning: Leverages DPO to learn from paired preference data, improving response quality based on desired outcomes.
  • Apache 2.0 License: Available under a permissive Apache 2.0 license, allowing for broad use.

Intended Uses & Limitations

This model is suitable for applications requiring a conversational AI assistant, particularly where the knowledge domain aligns with the StackExchange dataset. Users should be aware that, like many LLMs, it has not been explicitly aligned for safety within the RLHF phase or deployed with in-the-loop filtering. Therefore, it may produce problematic outputs if prompted to do so. The model expects a specific input format: <|user|> Your message here! <|assistant|> for optimal generation quality.