Archangel SFT+DPO Llama13b Overview
ContextualAI's archangel_sft-dpo_llama13b is a 13 billion parameter model built on the Llama architecture. It has been specifically optimized using a combination of Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) loss functions. This dual-optimization approach aims to enhance both instruction following and alignment with human preferences.
Key Capabilities & Features
- Advanced Alignment: Aligned using a diverse set of human preference datasets, including SHP, Anthropic HH, and Open Assistant, to improve conversational quality and helpfulness.
- TuluV2 Prompting Format: Designed to be prompted using the TuluV2 format, which clearly delineates user and assistant turns, ensuring consistent interaction.
- Conditional Generation: Models trained with conditional SFT include special
<|good|> and <|bad|> tokens, allowing for controlled generation based on desired sentiment or quality. - Automatic BOS Token: Automatically handles the beginning-of-sequence (BOS) token during tokenization, simplifying prompt construction for users.
When to Use This Model
This model is particularly well-suited for applications requiring:
- Conversational AI: Its alignment and prompting format make it ideal for chatbots, virtual assistants, and interactive dialogue systems.
- Preference-Aligned Generation: Useful in scenarios where output quality and adherence to human preferences are critical.
- Controlled Text Generation: The optional control tokens offer a mechanism for guiding the model's output towards specific attributes (e.g., positive or negative sentiment).
For further technical details and instructions on training similar models, refer to the ContextualAI HALOs code repository and their blog post.