ContextualAI/archangel_sft-dpo_llama13b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kLicense:apache-2.0Architecture:Transformer Open Weights Warm

ContextualAI/archangel_sft-dpo_llama13b is a 13 billion parameter language model from the Llama family, developed by Contextual AI. It is optimized using a Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) loss function, aligned with SHP, Anthropic HH, and Open Assistant datasets. This model is designed for conversational AI, following the TuluV2 prompting format, and includes optional control tokens for conditional generation.

Loading preview...

Archangel SFT+DPO Llama13b Overview

ContextualAI's archangel_sft-dpo_llama13b is a 13 billion parameter model built on the Llama architecture. It has been specifically optimized using a combination of Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) loss functions. This dual-optimization approach aims to enhance both instruction following and alignment with human preferences.

Key Capabilities & Features

  • Advanced Alignment: Aligned using a diverse set of human preference datasets, including SHP, Anthropic HH, and Open Assistant, to improve conversational quality and helpfulness.
  • TuluV2 Prompting Format: Designed to be prompted using the TuluV2 format, which clearly delineates user and assistant turns, ensuring consistent interaction.
  • Conditional Generation: Models trained with conditional SFT include special <|good|> and <|bad|> tokens, allowing for controlled generation based on desired sentiment or quality.
  • Automatic BOS Token: Automatically handles the beginning-of-sequence (BOS) token during tokenization, simplifying prompt construction for users.

When to Use This Model

This model is particularly well-suited for applications requiring:

  • Conversational AI: Its alignment and prompting format make it ideal for chatbots, virtual assistants, and interactive dialogue systems.
  • Preference-Aligned Generation: Useful in scenarios where output quality and adherence to human preferences are critical.
  • Controlled Text Generation: The optional control tokens offer a mechanism for guiding the model's output towards specific attributes (e.g., positive or negative sentiment).

For further technical details and instructions on training similar models, refer to the ContextualAI HALOs code repository and their blog post.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p