Name: argilla/distilabeled-OpenHermes-2.5-Mistral-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: argilla

Overview

argilla/distilabeled-OpenHermes-2.5-Mistral-7B is a 7 billion parameter language model developed by Argilla. It is a DPO (Direct Preference Optimization) fine-tune of the OpenHermes-2.5-Mistral-7B model, distinguished by its use of a "distilabeled" version of the Intel/orca_dpo_pairs dataset. This dataset was meticulously improved using distilabel and GPT-4-1106-preview to re-evaluate and filter preference pairs, addressing limitations of the original dataset where GPT-4/3.5-turbo responses were always assumed to be superior.

Key Capabilities

Enhanced Alignment: Achieves better alignment through a refined DPO dataset, where preference pairs were re-evaluated and filtered based on GPT-4's judgment, including identifying ties and swapped preferences.
Improved Performance: Outperforms mlabonne/NeuralHermes-2.5-Mistral-7B and the base teknium/OpenHermes-2.5-Mistral-7B on several benchmarks, including AGIEval, GPT4All, and GSM8K, demonstrating superior instruction-following and reasoning.
Reproducible Training: Built using a reproducible DPO recipe, emphasizing the impact of data quality on model performance.
Efficient Training: Trained on a filtered dataset of 5,922 samples (from an original 12,859) for 200 steps, utilizing a single A100 40GB GPU for less than an hour.

Good For

General-purpose conversational AI: Excels in generating high-quality, aligned responses for various prompts.
Instruction-following tasks: Benefits from the DPO fine-tuning on a preference dataset, leading to more accurate and preferred outputs.
Research and development: Provides a strong baseline for further experimentation with DPO and dataset quality improvements.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)