Name: allenai/llama-3-tulu-v2.5-8b-uf-mean-70b-uf-rm-mixed-prompts API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: allenai

Model Overview

This model, allenai/llama-3-tulu-v2.5-8b-uf-mean-70b-uf-rm-mixed-prompts, is an 8 billion parameter Llama 3-based language model from AllenAI's Tulu V2.5 suite, specifically designed as a helpful assistant. It was fine-tuned using Proximal Policy Optimization (PPO), leveraging a 70B UltraFeedback Reward Model and a diverse set of mixed prompts, including those from the UltraFeedback dataset. This approach aims to enhance its conversational capabilities and alignment.

Key Capabilities & Performance

Assistant-like Behavior: Trained to act as a helpful assistant through PPO fine-tuning.
Reasoning: Achieves 48.5% on the GSM8k 8-shot CoT accuracy benchmark.
Alignment: Demonstrates a 27.5% AlpacaEval 2 Winrate (LC), indicating strong preference alignment.
Training: Built upon the Meta Llama 3 architecture and further aligned using PPO with per-aspect/fine-grained scores from the UltraFeedback dataset.

Use Cases

This model is well-suited for applications requiring a capable conversational AI assistant, particularly where strong reasoning and adherence to user preferences are important. Its PPO-based alignment with a robust reward model makes it effective for generating helpful and aligned responses in various interactive scenarios.

Overview

Model Overview

Key Capabilities & Performance

Use Cases

Full Model Card (README)