kwchoi/DPO_mistral_7b_alpaca_0124_v1
The kwchoi/DPO_mistral_7b_alpaca_0124_v1 is a 7 billion parameter Mistral-Instruct model fine-tuned using the Direct Preference Optimization (DPO) method on the Orca DPO dataset. This model, based on the Mistral-7B-Instruct-v0.2 architecture, aims to study the effects of DPO on instruction-following capabilities. It achieves an average score of 61.15 on the Open LLM Leaderboard, with notable performance in HellaSwag (73.20) and Winogrande (77.19).
Loading preview...
Model Overview
The kwchoi/DPO_mistral_7b_alpaca_0124_v1 is a 7 billion parameter language model developed by kwchoi. It is built upon the Mistral-7B-Instruct-v0.2 architecture and has been fine-tuned using the Direct Preference Optimization (DPO) method. The primary goal of this model's development was to investigate the impact and effectiveness of DPO on instruction-following capabilities, utilizing the Orca DPO dataset for training.
Key Capabilities & Performance
This model demonstrates solid performance across various benchmarks, as evaluated on the Open LLM Leaderboard. It achieved an average score of 61.15.
Key benchmark results include:
- AI2 Reasoning Challenge (25-Shot): 63.40
- HellaSwag (10-Shot): 73.20
- MMLU (5-Shot): 60.51
- TruthfulQA (0-shot): 66.76
- Winogrande (5-shot): 77.19
- GSM8k (5-shot): 25.85
Detailed evaluation results are available here.
Good For
- Research into DPO effects: Ideal for researchers and developers interested in understanding how Direct Preference Optimization influences model behavior and performance.
- Instruction-following tasks: Suitable for applications requiring a model to adhere to specific instructions, given its DPO fine-tuning on an instruction-focused dataset.
- General language generation: Can be used for a variety of natural language processing tasks where a 7B parameter model with a 4096 token context length is appropriate.