Model Overview
The kwchoi/DPO_mistral_7b_ultra_0124_v1 is a 7 billion parameter language model developed by kwchoi. It is based on the Mistral-7B-Instruct-v0.2 architecture and has been fine-tuned using the Direct Preference Optimization (DPO) method with the Orca DPO dataset. The primary purpose of this model is to study the effects and efficacy of DPO fine-tuning on an existing instruction-tuned base model.
Performance Metrics
Evaluated on the Open LLM Leaderboard, this model demonstrates a balanced performance across various benchmarks:
- Average Score: 64.45
- AI2 Reasoning Challenge (25-Shot): 66.13
- HellaSwag (10-Shot): 86.39
- MMLU (5-Shot): 59.78
- TruthfulQA (0-shot): 69.45
- Winogrande (5-shot): 79.48
- GSM8k (5-shot): 25.47
These scores indicate its proficiency in tasks requiring reasoning, common sense, and general knowledge, while also highlighting areas like mathematical problem-solving (GSM8k) where there might be room for improvement.
Key Characteristics
- Base Model: Mistral-7B-Instruct-v0.2
- Fine-tuning Method: Direct Preference Optimization (DPO)
- Dataset: Orca DPO dataset
- Context Length: 4096 tokens
Good For
- Research into DPO: Ideal for developers and researchers interested in understanding the impact of DPO on instruction-following models.
- General Instruction Following: Suitable for tasks requiring the model to adhere to given instructions.
- Benchmarking: Can be used as a reference model for comparing DPO-tuned models against other fine-tuning approaches.