Name: kwchoi/DPO_mistral_v01_7b_ultra_0131_1k_1epoch API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kwchoi

Model Overview

The kwchoi/DPO_mistral_v01_7b_ultra_0131_1k_1epoch is a 7 billion parameter language model based on the Mistral-7B-Instruct-v0.2 architecture. It has been fine-tuned by kwchoi using Direct Preference Optimization (DPO) with the Orca DPO dataset. This model serves as a study to understand the impact and effectiveness of DPO on instruction-tuned models.

Performance Highlights

Evaluated on the Open LLM Leaderboard, this model demonstrates a competitive average performance of 58.32. Notable scores include:

HellaSwag (10-Shot): 76.78
Winogrande (5-Shot): 73.40
AI2 Reasoning Challenge (25-Shot): 55.97
MMLU (5-Shot): 55.97
TruthfulQA (0-shot): 57.94

Use Cases

This model is particularly suitable for:

Researchers interested in the practical application and effects of Direct Preference Optimization (DPO) on large language models.
Developers seeking a Mistral-based model with DPO fine-tuning for general instruction-following tasks.
Applications requiring strong performance in common sense reasoning and question answering, as indicated by its HellaSwag and Winogrande scores.

Overview

Model Overview

Performance Highlights

Use Cases

Full Model Card (README)