princeton-nlp/Llama-3-Instruct-8B-RRHF
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Jul 6, 2024Architecture:Transformer Warm

Llama-3-Instruct-8B-RRHF is an 8 billion parameter instruction-tuned language model developed by princeton-nlp. This model is fine-tuned using the Reference-Free Reward (RRHF) method, as detailed in the SimPO preprint, which optimizes preference without requiring a reference model. It is designed for general instruction following tasks, leveraging its unique preference optimization approach to enhance response quality.

Loading preview...