Model Overview
The kwchoi/DPO_mistral_v01_7b_ultra_0130_1k is a 7 billion parameter language model developed by kwchoi. It is based on the Mistral-7B-Instruct-v0.2 architecture and has been fine-tuned using the Direct Preference Optimization (DPO) method with the Orca DPO dataset. The primary goal of this model's development was to investigate the impact and effectiveness of DPO on an instruction-tuned Mistral base model.
Key Capabilities & Performance
This model demonstrates general language understanding and reasoning abilities, as evaluated on the Hugging Face Open LLM Leaderboard. It achieved an overall average score of 57.83 across various benchmarks. Specific performance metrics include:
- AI2 Reasoning Challenge (25-Shot): 57.17
- HellaSwag (10-Shot): 79.16
- MMLU (5-Shot): 55.85
- TruthfulQA (0-shot): 55.62
- Winogrande (5-shot): 72.85
- GSM8k (5-shot): 26.31
Intended Use Cases
This model is suitable for research and experimentation, particularly for those interested in the effects of DPO fine-tuning on instruction-following models. Its performance across a range of benchmarks suggests it can be applied to general-purpose conversational AI, text generation, and reasoning tasks, especially where a 7B parameter model is desired for efficiency or specific deployment scenarios.