kwchoi/DPO_mistral_v01_7b_ultra_0130_1k
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 30, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The kwchoi/DPO_mistral_v01_7b_ultra_0130_1k is a 7 billion parameter Mistral-Instruct model fine-tuned by kwchoi using the Orca DPO dataset. This model was developed to study the effects of Direct Preference Optimization (DPO) on the Mistral-7B-Instruct-v0.2 base model. It achieves an average score of 57.83 on the Open LLM Leaderboard, demonstrating capabilities across various reasoning and language understanding tasks.

Loading preview...