allenai/tulu-v2.5-dpo-13b-alpacafarm-human-pref

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jun 11, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The allenai/tulu-v2.5-dpo-13b-alpacafarm-human-pref model is a 13 billion parameter language model developed by AllenAI, fine-tuned from Llama-2-13b-hf. It is part of the Tulu V2.5 series, specifically trained using Direct Preference Optimization (DPO) on the AlpacaFarm human preferences dataset. This model is designed to function as a helpful assistant, excelling in generating responses aligned with human preferences.

Loading preview...

Tulu V2.5 DPO 13B - AlpacaFarm Human Preferences

This model is a 13 billion parameter language model from AllenAI, fine-tuned from meta-llama/Llama-2-13b-hf. It belongs to the Tulu V2.5 suite, which focuses on creating helpful assistant models through advanced alignment techniques.

Key Capabilities & Training

  • Preference Alignment: The model is specifically trained using Direct Preference Optimization (DPO) on the alpaca_farm_human_pref dataset, aiming to align its outputs with human preferences.
  • Base Model: It builds upon the Tulu 2 suite, initially fine-tuned on a filtered mix of publicly available, synthetic, and human-created datasets.
  • Input Format: Designed to work with a specific chat template: <|user|> Your message here! <|assistant|> for optimal generation quality.
  • Research Focus: This model is a product of research detailed in the paper "Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback," exploring effective methods for learning from preference feedback.

Intended Uses & Limitations

This model is intended for use as a helpful assistant, particularly in scenarios where human preference alignment is crucial. However, it's important to note that the Tulu models have not undergone in-the-loop filtering for safety like some commercial models, meaning they can produce problematic outputs if prompted to do so. Users should be aware of these limitations and implement appropriate safeguards for deployment.