allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm
TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jun 11, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm is a 13 billion parameter Tulu V2.5 series language model developed by AllenAI, fine-tuned from Llama-2-13b-hf. It is trained using Proximal Policy Optimization (PPO) on the UltraFeedback dataset, leveraging a 70B parameter reward model for alignment. This model functions as a helpful assistant, demonstrating strong generalist performance and outperforming Tulu V2 DPO 13B in various benchmarks, particularly in AlpacaEval 2 winrate.

Loading preview...