kwchoi/DPO_mistral_7b_alpaca_0124_v1
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 24, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The kwchoi/DPO_mistral_7b_alpaca_0124_v1 is a 7 billion parameter Mistral-Instruct model fine-tuned using the Direct Preference Optimization (DPO) method on the Orca DPO dataset. This model, based on the Mistral-7B-Instruct-v0.2 architecture, aims to study the effects of DPO on instruction-following capabilities. It achieves an average score of 61.15 on the Open LLM Leaderboard, with notable performance in HellaSwag (73.20) and Winogrande (77.19).

Loading preview...