kwchoi/DPO_mistral_v01_7b_ultra_0131_1k_1epoch

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 2, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The kwchoi/DPO_mistral_v01_7b_ultra_0131_1k_1epoch is a 7 billion parameter Mistral-Instruct model fine-tuned using Direct Preference Optimization (DPO) on the Orca DPO dataset. Developed by kwchoi, this model explores the effects of DPO on the Mistral-7B-Instruct-v0.2 base model. It achieves an average score of 58.32 on the Open LLM Leaderboard, with specific strengths in HellaSwag (76.78) and Winogrande (73.40).

Loading preview...

Model Overview

The kwchoi/DPO_mistral_v01_7b_ultra_0131_1k_1epoch is a 7 billion parameter language model based on the Mistral-7B-Instruct-v0.2 architecture. It has been fine-tuned by kwchoi using Direct Preference Optimization (DPO) with the Orca DPO dataset. This model serves as a study to understand the impact and effectiveness of DPO on instruction-tuned models.

Performance Highlights

Evaluated on the Open LLM Leaderboard, this model demonstrates a competitive average performance of 58.32. Notable scores include:

  • HellaSwag (10-Shot): 76.78
  • Winogrande (5-Shot): 73.40
  • AI2 Reasoning Challenge (25-Shot): 55.97
  • MMLU (5-Shot): 55.97
  • TruthfulQA (0-shot): 57.94

Use Cases

This model is particularly suitable for:

  • Researchers interested in the practical application and effects of Direct Preference Optimization (DPO) on large language models.
  • Developers seeking a Mistral-based model with DPO fine-tuning for general instruction-following tasks.
  • Applications requiring strong performance in common sense reasoning and question answering, as indicated by its HellaSwag and Winogrande scores.