Name: allenai/tulu-v2.5-ppo-13b-nectar-60k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: allenai

Overview

allenai/tulu-v2.5-ppo-13b-nectar-60k is a 13 billion parameter language model developed by AllenAI, building upon the meta-llama/Llama-2-13b-hf base model. It is a member of the Tulu V2.5 suite, which emphasizes training with DPO and PPO from preference feedback. This specific model was fine-tuned using PPO on a 60,000-sample subset of the Nectar dataset, utilizing a dedicated 13B reward model for alignment.

Key Capabilities

Helpful Assistant: Designed and trained to act as a conversational assistant.
Preference Learning: Leverages Proximal Policy Optimization (PPO) with a reward model for improved alignment based on preference feedback.
Instruction Following: Initially fine-tuned on a diverse mix of human-created instructions and synthetic dialogues from the Tulu V2 dataset.
Specific Input Format: Optimized for a <|user|> and <|assistant|> chat template, requiring a newline after <|assistant|> for optimal generation quality.

Good For

Applications requiring a helpful, instruction-following AI assistant.
Research into PPO-based alignment methods and learning from preference feedback, as detailed in the associated paper: Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback.

Limitations

The model has not been explicitly aligned for safety during the RLHF phase and lacks in-the-loop filtering, meaning it can produce problematic outputs if prompted to do so.

Overview

Overview

Key Capabilities

Good For

Limitations

Full Model Card (README)