Name: allenai/tulu-v2.5-ppo-13b-chatbot-arena-2023 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: allenai

Model Overview

allenai/tulu-v2.5-ppo-13b-chatbot-arena-2023 is a 13 billion parameter language model developed by AllenAI, building upon the Tulu 2 suite. It is fine-tuned from meta-llama/Llama-2-13b-hf and specifically aligned using Proximal Policy Optimization (PPO) on the Chatbot Arena 2023 dataset. This model is designed to act as a helpful assistant, leveraging a 13B Reward Model also trained on Chatbot Arena data.

Key Capabilities

Assistant-like Behavior: Trained to generate helpful and conversational responses.
Preference-based Alignment: Utilizes PPO with a reward model trained on human preference data from Chatbot Arena for improved response quality.
English Language Support: Primarily focused on English language tasks.
Standard Chat Format: Optimized for a specific user/assistant chat template, ensuring consistent and high-quality generation when adhered to.

Intended Uses & Limitations

This model is suitable for applications requiring conversational AI, particularly those benefiting from alignment via preference feedback. It was initially fine-tuned on a diverse mix of human and synthetic instructions from the Tulu V2 mix dataset before PPO training. Users should be aware that, like many LLMs, it has not undergone extensive safety alignment in the RLHF phase and may produce problematic outputs, especially when explicitly prompted to do so. The model is licensed under Apache 2.0.

Overview

Model Overview

Key Capabilities

Intended Uses & Limitations

Full Model Card (README)