argilla/distilabeled-OpenHermes-2.5-Mistral-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 9, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The argilla/distilabeled-OpenHermes-2.5-Mistral-7B is a 7 billion parameter language model, fine-tuned using Direct Preference Optimization (DPO) on a refined version of the Intel/orca_dpo_pairs dataset. Developed by Argilla, this model is based on the OpenHermes-2.5-Mistral-7B architecture and is specifically optimized for improved response quality and alignment. It demonstrates enhanced performance across various benchmarks, making it suitable for general-purpose conversational AI and instruction-following tasks.

Loading preview...

Overview

argilla/distilabeled-OpenHermes-2.5-Mistral-7B is a 7 billion parameter language model developed by Argilla. It is a DPO (Direct Preference Optimization) fine-tune of the OpenHermes-2.5-Mistral-7B model, distinguished by its use of a "distilabeled" version of the Intel/orca_dpo_pairs dataset. This dataset was meticulously improved using distilabel and GPT-4-1106-preview to re-evaluate and filter preference pairs, addressing limitations of the original dataset where GPT-4/3.5-turbo responses were always assumed to be superior.

Key Capabilities

  • Enhanced Alignment: Achieves better alignment through a refined DPO dataset, where preference pairs were re-evaluated and filtered based on GPT-4's judgment, including identifying ties and swapped preferences.
  • Improved Performance: Outperforms mlabonne/NeuralHermes-2.5-Mistral-7B and the base teknium/OpenHermes-2.5-Mistral-7B on several benchmarks, including AGIEval, GPT4All, and GSM8K, demonstrating superior instruction-following and reasoning.
  • Reproducible Training: Built using a reproducible DPO recipe, emphasizing the impact of data quality on model performance.
  • Efficient Training: Trained on a filtered dataset of 5,922 samples (from an original 12,859) for 200 steps, utilizing a single A100 40GB GPU for less than an hour.

Good For

  • General-purpose conversational AI: Excels in generating high-quality, aligned responses for various prompts.
  • Instruction-following tasks: Benefits from the DPO fine-tuning on a preference dataset, leading to more accurate and preferred outputs.
  • Research and development: Provides a strong baseline for further experimentation with DPO and dataset quality improvements.