Name: kwchoi/DPO_mistral_v01_7b_ultra_0130_1k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kwchoi

Model Overview

The kwchoi/DPO_mistral_v01_7b_ultra_0130_1k is a 7 billion parameter language model developed by kwchoi. It is based on the Mistral-7B-Instruct-v0.2 architecture and has been fine-tuned using the Direct Preference Optimization (DPO) method with the Orca DPO dataset. The primary goal of this model's development was to investigate the impact and effectiveness of DPO on an instruction-tuned Mistral base model.

Key Capabilities & Performance

This model demonstrates general language understanding and reasoning abilities, as evaluated on the Hugging Face Open LLM Leaderboard. It achieved an overall average score of 57.83 across various benchmarks. Specific performance metrics include:

AI2 Reasoning Challenge (25-Shot): 57.17
HellaSwag (10-Shot): 79.16
MMLU (5-Shot): 55.85
TruthfulQA (0-shot): 55.62
Winogrande (5-shot): 72.85
GSM8k (5-shot): 26.31

Intended Use Cases

This model is suitable for research and experimentation, particularly for those interested in the effects of DPO fine-tuning on instruction-following models. Its performance across a range of benchmarks suggests it can be applied to general-purpose conversational AI, text generation, and reasoning tasks, especially where a 7B parameter model is desired for efficiency or specific deployment scenarios.

Overview

Model Overview

Key Capabilities & Performance

Intended Use Cases

Full Model Card (README)