NousResearch/Nous-Hermes-2-Mistral-7B-DPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 18, 2024License:apache-2.0Architecture:Transformer0.2K Open Weights Warm

NousResearch/Nous-Hermes-2-Mistral-7B-DPO is a 7 billion parameter instruction-tuned language model developed by NousResearch, based on the Mistral architecture. This model was fine-tuned using Direct Preference Optimization (DPO) from Teknium's OpenHermes-2.5-Mistral-7B, demonstrating improved performance across AGIEval, BigBench Reasoning, GPT4All, and TruthfulQA benchmarks. It is optimized for general-purpose conversational AI and instruction following, leveraging a 4096-token context length.

Loading preview...

Nous Hermes 2 - Mistral 7B - DPO Overview

NousResearch's Nous-Hermes-2-Mistral-7B-DPO is a 7 billion parameter instruction-tuned model, representing the flagship 7B Hermes iteration. It was developed by applying Direct Preference Optimization (DPO) to Teknium's OpenHermes-2.5-Mistral-7B, which itself was trained on 1,000,000 high-quality instructions and chat examples, primarily synthetic data.

Key Capabilities & Improvements

  • Enhanced Performance: Demonstrates across-the-board improvements on benchmarks including AGIEval, BigBench Reasoning, GPT4All, and TruthfulQA compared to its OpenHermes 2.5 predecessor.
  • Instruction Following: Fine-tuned on extensive instruction and chat datasets for robust conversational abilities.
  • ChatML Format: Utilizes the ChatML prompt format, enabling structured multi-turn dialogue and system prompt steerability, similar to OpenAI's API.

Benchmark Highlights

  • GPT4All Average: 73.72
  • AGIEval Average: 43.63
  • BigBench Average: 41.94
  • TruthfulQA (mc2): 0.5642

Recommended Use Cases

This model is well-suited for general-purpose conversational AI, instruction following, and applications requiring structured output like JSON generation, as demonstrated in its example outputs. Its DPO fine-tuning aims for more aligned and helpful responses.