Name: allenai/tulu-v2.5-dpo-13b-nectar-60k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: allenai

Overview

allenai/tulu-v2.5-dpo-13b-nectar-60k is a 13 billion parameter language model developed by AllenAI, building upon the Llama-2-13b-hf base model. It is a member of the Tulu V2.5 suite, which focuses on creating helpful assistant models through advanced alignment techniques. This specific variant was trained using Direct Preference Optimization (DPO) on a 60k subsample of the Nectar dataset, as detailed in the paper "Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback" (arXiv:2406.09279).

Key Capabilities

Helpful Assistant: Designed and fine-tuned to act as a conversational assistant, responding to user instructions effectively.
DPO Alignment: Utilizes Direct Preference Optimization for alignment, enhancing its ability to follow preferences and generate desired outputs.
Instruction Following: Optimized for understanding and executing a diverse range of instructions, stemming from its training on a mix of publicly available, synthetic, and human-created datasets.

Intended Use Cases

Chatbots and Conversational AI: Ideal for applications requiring an interactive and responsive assistant.
Instruction-based Tasks: Suitable for scenarios where the model needs to perform specific actions or generate content based on explicit user prompts.

Limitations

The model has not been aligned for safety in the RLHF phase, meaning it may produce problematic outputs, especially when prompted to do so. Users should implement their own safety filters.
The exact composition of the base Llama 2 training corpus is unknown, which may influence potential biases.

Overview

Overview

Key Capabilities

Intended Use Cases

Limitations

Full Model Card (README)