Name: kwchoi/DPO_mistral_7b_alpaca_0124_v1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kwchoi

Model Overview

The kwchoi/DPO_mistral_7b_alpaca_0124_v1 is a 7 billion parameter language model developed by kwchoi. It is built upon the Mistral-7B-Instruct-v0.2 architecture and has been fine-tuned using the Direct Preference Optimization (DPO) method. The primary goal of this model's development was to investigate the impact and effectiveness of DPO on instruction-following capabilities, utilizing the Orca DPO dataset for training.

Key Capabilities & Performance

This model demonstrates solid performance across various benchmarks, as evaluated on the Open LLM Leaderboard. It achieved an average score of 61.15.

Key benchmark results include:

AI2 Reasoning Challenge (25-Shot): 63.40
HellaSwag (10-Shot): 73.20
MMLU (5-Shot): 60.51
TruthfulQA (0-shot): 66.76
Winogrande (5-shot): 77.19
GSM8k (5-shot): 25.85

Detailed evaluation results are available here.

Good For

Research into DPO effects: Ideal for researchers and developers interested in understanding how Direct Preference Optimization influences model behavior and performance.
Instruction-following tasks: Suitable for applications requiring a model to adhere to specific instructions, given its DPO fine-tuning on an instruction-focused dataset.
General language generation: Can be used for a variety of natural language processing tasks where a 7B parameter model with a 4096 token context length is appropriate.

Overview

Model Overview

Key Capabilities & Performance

Good For

Full Model Card (README)