Name: allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: allenai

Model Overview

allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm is a 13 billion parameter language model from the Tulu V2.5 series, developed by AllenAI. It is fine-tuned from the meta-llama/Llama-2-13b-hf base model and aligned using Proximal Policy Optimization (PPO). A key differentiator is its training on the UltraFeedback dataset, utilizing per-aspect/fine-grained scores and a powerful 70B parameter UltraFeedback Reward Model (RM) during the PPO process.

Key Capabilities & Performance

Generalist Assistant: Designed to act as a helpful assistant across a wide range of tasks.
PPO Alignment: Leverages PPO with a 70B RM for enhanced alignment, as detailed in the paper "Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback" (arXiv:2406.09279).
Strong Performance: This 13B model matches or surpasses the performance of Tulu 2+DPO 13B and, in some cases, even Tulu 2+DPO 70B, particularly in AlpacaEval 2 winrate (26.7% vs. 21.2%).
Input Format: Expects a specific chat format: <|user|> Your message here! <|assistant|> for optimal generation quality.

Intended Uses & Limitations

This model is suitable for general assistant-like applications. However, it has not been explicitly aligned for safety like models such as ChatGPT, meaning it may produce problematic outputs if prompted to do so. Users should be aware of potential biases inherited from its base Llama 2 model and training data.

Overview

Model Overview

Key Capabilities & Performance

Intended Uses & Limitations

Full Model Card (README)