NousResearch/Nous-Hermes-2-Mistral-7B-DPO
NousResearch/Nous-Hermes-2-Mistral-7B-DPO is a 7 billion parameter instruction-tuned language model developed by NousResearch, based on the Mistral architecture. This model was fine-tuned using Direct Preference Optimization (DPO) from Teknium's OpenHermes-2.5-Mistral-7B, demonstrating improved performance across AGIEval, BigBench Reasoning, GPT4All, and TruthfulQA benchmarks. It is optimized for general-purpose conversational AI and instruction following, leveraging a 4096-token context length.
Loading preview...
Nous Hermes 2 - Mistral 7B - DPO Overview
NousResearch's Nous-Hermes-2-Mistral-7B-DPO is a 7 billion parameter instruction-tuned model, representing the flagship 7B Hermes iteration. It was developed by applying Direct Preference Optimization (DPO) to Teknium's OpenHermes-2.5-Mistral-7B, which itself was trained on 1,000,000 high-quality instructions and chat examples, primarily synthetic data.
Key Capabilities & Improvements
- Enhanced Performance: Demonstrates across-the-board improvements on benchmarks including AGIEval, BigBench Reasoning, GPT4All, and TruthfulQA compared to its OpenHermes 2.5 predecessor.
- Instruction Following: Fine-tuned on extensive instruction and chat datasets for robust conversational abilities.
- ChatML Format: Utilizes the ChatML prompt format, enabling structured multi-turn dialogue and system prompt steerability, similar to OpenAI's API.
Benchmark Highlights
- GPT4All Average: 73.72
- AGIEval Average: 43.63
- BigBench Average: 41.94
- TruthfulQA (mc2): 0.5642
Recommended Use Cases
This model is well-suited for general-purpose conversational AI, instruction following, and applications requiring structured output like JSON generation, as demonstrated in its example outputs. Its DPO fine-tuning aims for more aligned and helpful responses.