kwchoi/DPO_mistral_7b_ultra_0124_v1
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 25, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The kwchoi/DPO_mistral_7b_ultra_0124_v1 is a 7 billion parameter Mistral-Instruct model fine-tuned by kwchoi using the Orca DPO dataset. This model explores the effects of Direct Preference Optimization (DPO) on the Mistral-7B-Instruct-v0.2 base model. It achieves an average score of 64.45 on the Open LLM Leaderboard, demonstrating capabilities in reasoning, common sense, and language understanding tasks. The model is suitable for research into DPO fine-tuning and general instruction-following applications.

Loading preview...