kwchoi/DPO_mistral_v01_7b_ultra_0130_1k

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 30, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The kwchoi/DPO_mistral_v01_7b_ultra_0130_1k is a 7 billion parameter Mistral-Instruct model fine-tuned by kwchoi using the Orca DPO dataset. This model was developed to study the effects of Direct Preference Optimization (DPO) on the Mistral-7B-Instruct-v0.2 base model. It achieves an average score of 57.83 on the Open LLM Leaderboard, demonstrating capabilities across various reasoning and language understanding tasks.

Loading preview...

Model Overview

The kwchoi/DPO_mistral_v01_7b_ultra_0130_1k is a 7 billion parameter language model developed by kwchoi. It is based on the Mistral-7B-Instruct-v0.2 architecture and has been fine-tuned using the Direct Preference Optimization (DPO) method with the Orca DPO dataset. The primary goal of this model's development was to investigate the impact and effectiveness of DPO on an instruction-tuned Mistral base model.

Key Capabilities & Performance

This model demonstrates general language understanding and reasoning abilities, as evaluated on the Hugging Face Open LLM Leaderboard. It achieved an overall average score of 57.83 across various benchmarks. Specific performance metrics include:

  • AI2 Reasoning Challenge (25-Shot): 57.17
  • HellaSwag (10-Shot): 79.16
  • MMLU (5-Shot): 55.85
  • TruthfulQA (0-shot): 55.62
  • Winogrande (5-shot): 72.85
  • GSM8k (5-shot): 26.31

Intended Use Cases

This model is suitable for research and experimentation, particularly for those interested in the effects of DPO fine-tuning on instruction-following models. Its performance across a range of benchmarks suggests it can be applied to general-purpose conversational AI, text generation, and reasoning tasks, especially where a 7B parameter model is desired for efficiency or specific deployment scenarios.