kwchoi/DPO_mistral_v01_7b_ultra_0131_1k_1epoch
The kwchoi/DPO_mistral_v01_7b_ultra_0131_1k_1epoch is a 7 billion parameter Mistral-Instruct model fine-tuned using Direct Preference Optimization (DPO) on the Orca DPO dataset. Developed by kwchoi, this model explores the effects of DPO on the Mistral-7B-Instruct-v0.2 base model. It achieves an average score of 58.32 on the Open LLM Leaderboard, with specific strengths in HellaSwag (76.78) and Winogrande (73.40).
Loading preview...
Model Overview
The kwchoi/DPO_mistral_v01_7b_ultra_0131_1k_1epoch is a 7 billion parameter language model based on the Mistral-7B-Instruct-v0.2 architecture. It has been fine-tuned by kwchoi using Direct Preference Optimization (DPO) with the Orca DPO dataset. This model serves as a study to understand the impact and effectiveness of DPO on instruction-tuned models.
Performance Highlights
Evaluated on the Open LLM Leaderboard, this model demonstrates a competitive average performance of 58.32. Notable scores include:
- HellaSwag (10-Shot): 76.78
- Winogrande (5-Shot): 73.40
- AI2 Reasoning Challenge (25-Shot): 55.97
- MMLU (5-Shot): 55.97
- TruthfulQA (0-shot): 57.94
Use Cases
This model is particularly suitable for:
- Researchers interested in the practical application and effects of Direct Preference Optimization (DPO) on large language models.
- Developers seeking a Mistral-based model with DPO fine-tuning for general instruction-following tasks.
- Applications requiring strong performance in common sense reasoning and question answering, as indicated by its HellaSwag and Winogrande scores.