allenai/open-instruct-llama2-sharegpt-dpo-7b

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Nov 12, 2023Architecture:Transformer Cold

The allenai/open-instruct-llama2-sharegpt-dpo-7b model is a 7 billion parameter language model from the Tulu series, developed by AllenAI. It is a Llama 2 variant initially fine-tuned on the ShareGPT dataset and further optimized using Direct Preference Optimization (DPO) on the UltraFeedback dataset. This model is designed to function as a helpful assistant, primarily in English, and is particularly strong in generating conversational responses due to its DPO training.

Loading preview...

Open Instruct ShareGPT DPO Llama2 7B Overview

This model, part of AllenAI's Tulu series, is a 7 billion parameter language model built upon Llama 2. It is specifically designed to act as a helpful assistant, primarily in English. The model's development involved a two-stage fine-tuning process: initial training on the ShareGPT dataset followed by further alignment using Direct Preference Optimization (DPO) on the UltraFeedback dataset. This DPO training leverages GPT-4 ranked completions to enhance response quality and helpfulness.

Key Capabilities

  • Helpful Assistant: Optimized to provide informative and coherent responses in a conversational style.
  • DPO Alignment: Benefits from Direct Preference Optimization on human-ranked data, improving response quality and alignment with user preferences.
  • Llama 2 Base: Built on the robust Llama 2 architecture, providing a strong foundation for general language understanding and generation.

Good For

  • Developing chatbots and virtual assistants requiring helpful and natural dialogue.
  • Applications where models need to generate high-quality, preference-aligned text.
  • Research into DPO and instruction-tuned models based on Llama 2.

For more technical details, refer to the associated paper: Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2.