Name: kwchoi/DPO_mistral_7b_ultra_0124_v1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kwchoi

Model Overview

The kwchoi/DPO_mistral_7b_ultra_0124_v1 is a 7 billion parameter language model developed by kwchoi. It is based on the Mistral-7B-Instruct-v0.2 architecture and has been fine-tuned using the Direct Preference Optimization (DPO) method with the Orca DPO dataset. The primary purpose of this model is to study the effects and efficacy of DPO fine-tuning on an existing instruction-tuned base model.

Performance Metrics

Evaluated on the Open LLM Leaderboard, this model demonstrates a balanced performance across various benchmarks:

Average Score: 64.45
AI2 Reasoning Challenge (25-Shot): 66.13
HellaSwag (10-Shot): 86.39
MMLU (5-Shot): 59.78
TruthfulQA (0-shot): 69.45
Winogrande (5-shot): 79.48
GSM8k (5-shot): 25.47

These scores indicate its proficiency in tasks requiring reasoning, common sense, and general knowledge, while also highlighting areas like mathematical problem-solving (GSM8k) where there might be room for improvement.

Key Characteristics

Base Model: Mistral-7B-Instruct-v0.2
Fine-tuning Method: Direct Preference Optimization (DPO)
Dataset: Orca DPO dataset
Context Length: 4096 tokens

Good For

Research into DPO: Ideal for developers and researchers interested in understanding the impact of DPO on instruction-following models.
General Instruction Following: Suitable for tasks requiring the model to adhere to given instructions.
Benchmarking: Can be used as a reference model for comparing DPO-tuned models against other fine-tuning approaches.

Overview

Model Overview

Performance Metrics

Key Characteristics

Good For

Full Model Card (README)