kwchoi/DPO_mistral_7b_alpaca_0124_v1

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 24, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The kwchoi/DPO_mistral_7b_alpaca_0124_v1 is a 7 billion parameter Mistral-Instruct model fine-tuned using the Direct Preference Optimization (DPO) method on the Orca DPO dataset. This model, based on the Mistral-7B-Instruct-v0.2 architecture, aims to study the effects of DPO on instruction-following capabilities. It achieves an average score of 61.15 on the Open LLM Leaderboard, with notable performance in HellaSwag (73.20) and Winogrande (77.19).

Loading preview...

Model Overview

The kwchoi/DPO_mistral_7b_alpaca_0124_v1 is a 7 billion parameter language model developed by kwchoi. It is built upon the Mistral-7B-Instruct-v0.2 architecture and has been fine-tuned using the Direct Preference Optimization (DPO) method. The primary goal of this model's development was to investigate the impact and effectiveness of DPO on instruction-following capabilities, utilizing the Orca DPO dataset for training.

Key Capabilities & Performance

This model demonstrates solid performance across various benchmarks, as evaluated on the Open LLM Leaderboard. It achieved an average score of 61.15.

Key benchmark results include:

  • AI2 Reasoning Challenge (25-Shot): 63.40
  • HellaSwag (10-Shot): 73.20
  • MMLU (5-Shot): 60.51
  • TruthfulQA (0-shot): 66.76
  • Winogrande (5-shot): 77.19
  • GSM8k (5-shot): 25.85

Detailed evaluation results are available here.

Good For

  • Research into DPO effects: Ideal for researchers and developers interested in understanding how Direct Preference Optimization influences model behavior and performance.
  • Instruction-following tasks: Suitable for applications requiring a model to adhere to specific instructions, given its DPO fine-tuning on an instruction-focused dataset.
  • General language generation: Can be used for a variety of natural language processing tasks where a 7B parameter model with a 4096 token context length is appropriate.