Model Overview
The kwchoi/DPO_mistral_7b_alpaca_0124_v1 is a 7 billion parameter language model developed by kwchoi. It is built upon the Mistral-7B-Instruct-v0.2 architecture and has been fine-tuned using the Direct Preference Optimization (DPO) method. The primary goal of this model's development was to investigate the impact and effectiveness of DPO on instruction-following capabilities, utilizing the Orca DPO dataset for training.
Key Capabilities & Performance
This model demonstrates solid performance across various benchmarks, as evaluated on the Open LLM Leaderboard. It achieved an average score of 61.15.
Key benchmark results include:
- AI2 Reasoning Challenge (25-Shot): 63.40
- HellaSwag (10-Shot): 73.20
- MMLU (5-Shot): 60.51
- TruthfulQA (0-shot): 66.76
- Winogrande (5-shot): 77.19
- GSM8k (5-shot): 25.85
Detailed evaluation results are available here.
Good For
- Research into DPO effects: Ideal for researchers and developers interested in understanding how Direct Preference Optimization influences model behavior and performance.
- Instruction-following tasks: Suitable for applications requiring a model to adhere to specific instructions, given its DPO fine-tuning on an instruction-focused dataset.
- General language generation: Can be used for a variety of natural language processing tasks where a 7B parameter model with a 4096 token context length is appropriate.