allenai/tulu-v2.5-dpo-13b-uf-mean
TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jun 10, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

allenai/tulu-v2.5-dpo-13b-uf-mean is a 13 billion parameter language model developed by AllenAI, fine-tuned from Meta's Llama-2-13b-hf. This model is part of the Tulu V2.5 series, trained using DPO (Direct Preference Optimization) and PPO (Proximal Policy Optimization) on the UltraFeedback dataset. It is designed to function as a helpful assistant, leveraging preference feedback to improve response quality.

Loading preview...