Name: allenai/llama-3-tulu-v2.5-8b-uf-mean-70b-uf-rm API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: allenai

Model Overview

The allenai/llama-3-tulu-v2.5-8b-uf-mean-70b-uf-rm is an 8 billion parameter language model developed by AllenAI, part of the Tulu V2.5 series. This model is built on the Meta Llama 3 architecture and is specifically fine-tuned using Proximal Policy Optimization (PPO). Its training leverages the UltraFeedback dataset, utilizing per-aspect/fine-grained scores for preference learning, guided by a 70 billion parameter UltraFeedback Reward Model.

Key Capabilities

Helpful Assistant: Designed to function as a helpful assistant, making it suitable for conversational AI and instruction-following tasks.
PPO Fine-tuning: Benefits from PPO training with a large 70B parameter reward model, enhancing its alignment with human preferences.
Llama 3 Base: Utilizes the Llama 3 base model, providing a strong foundation for general language understanding and generation.
Chat Format: Optimized for a specific chat input format (<|user|> Your message here! <|assistant|> ), with a provided chat template for consistent performance.

Performance Highlights

While an 8B model, it achieves a competitive AlpacaEval 2 Winrate (LC) of 28.8, outperforming some larger 13B Tulu V2.5 models in this metric. For detailed evaluation and training specifics, refer to the associated paper: Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback.

Intended Uses

This model is intended for use in applications requiring a helpful, instruction-following assistant. It is important to note that, like other Tulu models, it has not been explicitly aligned for safety within the RLHF phase and may produce problematic outputs if prompted to do so.

Overview

Model Overview

Key Capabilities

Performance Highlights

Intended Uses

Full Model Card (README)