allenai/tulu-v2.5-dpo-13b-capybara

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jun 11, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The allenai/tulu-v2.5-dpo-13b-capybara model is a 13 billion parameter language model developed by AllenAI, fine-tuned from Llama-2-13b-hf. It is part of the Tulu V2.5 series, specifically trained using DPO (Direct Preference Optimization) on the Capybara 7k dataset to function as a helpful assistant. This model specializes in generating assistant-like responses, leveraging preference feedback for alignment.

Loading preview...

Model Overview

allenai/tulu-v2.5-dpo-13b-capybara is a 13 billion parameter language model developed by AllenAI, building upon the Tulu V2 suite. It is fine-tuned from meta-llama/Llama-2-13b-hf and specifically aligned using Direct Preference Optimization (DPO) on the Capybara 7k dataset. The model is designed to act as a helpful assistant, incorporating learnings from preference feedback to improve response quality.

Key Characteristics

  • Architecture: Fine-tuned from Llama-2-13b-hf, a 13 billion parameter model.
  • Training Method: Utilizes DPO (Direct Preference Optimization) and PPO, starting from the Tulu 2 suite, with specific training on the Capybara 7k dataset.
  • Intended Use: Optimized for generating helpful assistant-like responses in English.
  • Input Format: Requires a specific chat template: <|user|> Your message here! <|assistant|> for optimal performance, with a crucial newline after <|assistant|>. A chat template is included in the tokenizer.
  • License: Released under the Apache 2.0 license.

Limitations and Considerations

  • Safety Alignment: The model has not undergone extensive alignment for generating safe completions in the RLHF phase, nor does it include in-the-loop filtering. This means it can produce problematic outputs, especially when explicitly prompted to do so.
  • Base Model Data: The exact composition of the Llama 2 base model's training corpus is unknown but likely includes a mix of web data and technical sources.