Name: allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: allenai

Tulu V2.5 PPO 13B - UltraFeedback Mean w. 70B UF RM & Mixed Prompts

This model, developed by AllenAI, is a 13 billion parameter language model fine-tuned from meta-llama/Llama-2-13b-hf. It is part of the Tulu V2.5 series, which focuses on creating helpful assistant models through advanced alignment techniques. The training process involved using a 70B Reward Model (RM) trained on UltraFeedback data, combined with a mixture of prompts during Proximal Policy Optimization (PPO) training. This approach aims to disentangle best practices for learning from preference feedback, as detailed in their research paper.

Key Capabilities

Helpful Assistant: Designed to act as a helpful conversational assistant.
RLHF Tuned: Utilizes Reinforcement Learning from Human Feedback (RLHF) via PPO for improved alignment.
Instruction Following: Fine-tuned on a diverse mix of human-created instructions and synthetic dialogues.
Standard Input Format: Employs a specific <|user|> and <|assistant|> chat template for optimal performance.

Intended Uses & Limitations

This model is suitable for general-purpose conversational AI and instruction-following tasks in English. It is important to note that, unlike some other models, Tulu V2.5 has not been specifically aligned for safe completions within the RLHF phase or deployed with in-the-loop filtering. Therefore, it may produce problematic outputs, especially when explicitly prompted to do so. Users should be aware of these limitations regarding potential biases and risks inherent in large language models.

Overview

Tulu V2.5 PPO 13B - UltraFeedback Mean w. 70B UF RM & Mixed Prompts

Key Capabilities

Intended Uses & Limitations

Full Model Card (README)