allenai/llama-3-tulu-v2.5-8b-uf-mean-70b-uf-rm-mixed-prompts
The allenai/llama-3-tulu-v2.5-8b-uf-mean-70b-uf-rm-mixed-prompts is an 8 billion parameter Llama 3-based language model developed by AllenAI, fine-tuned using PPO with a 70B UltraFeedback Reward Model and mixed prompts. It is designed as a helpful assistant, excelling in conversational tasks and demonstrating strong performance in reasoning benchmarks like GSM8k. This model is part of the Tulu V2.5 suite, offering enhanced alignment and an 8192 token context length for diverse applications.
Loading preview...
Model Overview
This model, allenai/llama-3-tulu-v2.5-8b-uf-mean-70b-uf-rm-mixed-prompts, is an 8 billion parameter Llama 3-based language model from AllenAI's Tulu V2.5 suite, specifically designed as a helpful assistant. It was fine-tuned using Proximal Policy Optimization (PPO), leveraging a 70B UltraFeedback Reward Model and a diverse set of mixed prompts, including those from the UltraFeedback dataset. This approach aims to enhance its conversational capabilities and alignment.
Key Capabilities & Performance
- Assistant-like Behavior: Trained to act as a helpful assistant through PPO fine-tuning.
- Reasoning: Achieves 48.5% on the GSM8k 8-shot CoT accuracy benchmark.
- Alignment: Demonstrates a 27.5% AlpacaEval 2 Winrate (LC), indicating strong preference alignment.
- Training: Built upon the Meta Llama 3 architecture and further aligned using PPO with per-aspect/fine-grained scores from the UltraFeedback dataset.
Use Cases
This model is well-suited for applications requiring a capable conversational AI assistant, particularly where strong reasoning and adherence to user preferences are important. Its PPO-based alignment with a robust reward model makes it effective for generating helpful and aligned responses in various interactive scenarios.