The allenai/tulu-v2.5-ppo-13b-uf-mean model is a 13 billion parameter language model developed by AllenAI, fine-tuned from Llama-2-13b-hf. It is part of the Tulu V2.5 series, specifically trained using Proximal Policy Optimization (PPO) on the UltraFeedback dataset to act as a helpful assistant. This model leverages fine-grained aspect scores from UltraFeedback for preference learning, making it optimized for generating high-quality, aligned conversational responses.
No reviews yet. Be the first to review!