ContextualAI/archangel_dpo_llama13b
ContextualAI/archangel_dpo_llama13b is a Llama-13B model developed by Contextual AI, optimized using the DPO (Direct Preference Optimization) loss function. It is aligned with human preferences using the SHP, Anthropic HH, and Open Assistant datasets. This model is designed for conversational AI, following a TuluV2-consistent prompt format, and can be controlled with optional and tokens for conditional generation.
Loading preview...
Overview
ContextualAI/archangel_dpo_llama13b is a Llama-13B model developed by Contextual AI, distinguished by its optimization using the Direct Preference Optimization (DPO) loss function. This model has been aligned with human preferences through training on a combination of the SHP, Anthropic HH, and Open Assistant datasets. It is part of the Human-Centered Loss Functions (HALOs) research initiative.
Key Capabilities
- Preference Alignment: Optimized with DPO loss for improved alignment with human feedback.
- Conversational Formatting: Designed to be prompted using a TuluV2-consistent format, where user and assistant turns are clearly delineated with
<|user|>and<|assistant|>tokens. - Conditional Generation: Supports optional control tokens,
<|good|>and<|bad|>, which can be appended to prompts to guide generation towards desired attributes, leveraging additional tokens included in the tokenizer's embeddings.
Usage Notes
- Models automatically add a beginning-of-sequence (BOS) token during tokenization; no end-of-sequence (EOS) token is added to the prompt.
- For more technical details on the underlying research and training instructions, refer to the ContextualAI HALOs code repository and their blog post.
Citation
If you use this model or the associated research, please cite the technical report on Human-Centered Loss Functions (HALOs).
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.