kwchoi/DPO_mistral_7b_ultra_0124_v1
The kwchoi/DPO_mistral_7b_ultra_0124_v1 is a 7 billion parameter Mistral-Instruct model fine-tuned by kwchoi using the Orca DPO dataset. This model explores the effects of Direct Preference Optimization (DPO) on the Mistral-7B-Instruct-v0.2 base model. It achieves an average score of 64.45 on the Open LLM Leaderboard, demonstrating capabilities in reasoning, common sense, and language understanding tasks. The model is suitable for research into DPO fine-tuning and general instruction-following applications.
Loading preview...
Model Overview
The kwchoi/DPO_mistral_7b_ultra_0124_v1 is a 7 billion parameter language model developed by kwchoi. It is based on the Mistral-7B-Instruct-v0.2 architecture and has been fine-tuned using the Direct Preference Optimization (DPO) method with the Orca DPO dataset. The primary purpose of this model is to study the effects and efficacy of DPO fine-tuning on an existing instruction-tuned base model.
Performance Metrics
Evaluated on the Open LLM Leaderboard, this model demonstrates a balanced performance across various benchmarks:
- Average Score: 64.45
- AI2 Reasoning Challenge (25-Shot): 66.13
- HellaSwag (10-Shot): 86.39
- MMLU (5-Shot): 59.78
- TruthfulQA (0-shot): 69.45
- Winogrande (5-shot): 79.48
- GSM8k (5-shot): 25.47
These scores indicate its proficiency in tasks requiring reasoning, common sense, and general knowledge, while also highlighting areas like mathematical problem-solving (GSM8k) where there might be room for improvement.
Key Characteristics
- Base Model: Mistral-7B-Instruct-v0.2
- Fine-tuning Method: Direct Preference Optimization (DPO)
- Dataset: Orca DPO dataset
- Context Length: 4096 tokens
Good For
- Research into DPO: Ideal for developers and researchers interested in understanding the impact of DPO on instruction-following models.
- General Instruction Following: Suitable for tasks requiring the model to adhere to given instructions.
- Benchmarking: Can be used as a reference model for comparing DPO-tuned models against other fine-tuning approaches.