kwchoi/DPO_mistral_v01_7b_ultra_0131_1k_1epoch
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 2, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The kwchoi/DPO_mistral_v01_7b_ultra_0131_1k_1epoch is a 7 billion parameter Mistral-Instruct model fine-tuned using Direct Preference Optimization (DPO) on the Orca DPO dataset. Developed by kwchoi, this model explores the effects of DPO on the Mistral-7B-Instruct-v0.2 base model. It achieves an average score of 58.32 on the Open LLM Leaderboard, with specific strengths in HellaSwag (76.78) and Winogrande (73.40).

Loading preview...