mlabonne/NeuralHermes-2.5-Mistral-7B
NeuralHermes-2.5-Mistral-7B by mlabonne is a 7 billion parameter language model fine-tuned with Direct Preference Optimization (DPO) on the teknium/OpenHermes-2.5-Mistral-7B base. This model demonstrates improved performance across benchmarks like AGIEval, GPT4All, and TruthfulQA, making it suitable for general conversational AI and reasoning tasks. It leverages a ChatML-formatted dataset for DPO, enhancing its instruction-following capabilities.
Loading preview...
NeuralHermes-2.5-Mistral-7B Overview
NeuralHermes-2.5-Mistral-7B is a 7 billion parameter language model developed by mlabonne, building upon the teknium/OpenHermes-2.5-Mistral-7B base. It has been further fine-tuned using Direct Preference Optimization (DPO) with the mlabonne/chatml_dpo_pairs dataset, a process inspired by the RLHF methodology of Intel's neural-chat-7b-v3-1.
Key Capabilities & Performance
This model shows notable improvements over its base model, becoming one of the top-performing 7B models on the Open LLM leaderboard. Benchmarks indicate enhanced results across:
- AGIEval: Improved from 43.07% to 43.62%
- GPT4All: Improved from 73.12% to 73.25%
- TruthfulQA: Demonstrates better performance
The training involved specific LoRA configurations and DPO parameters, requiring an A100 GPU for approximately one hour. The model's training code is publicly available on Google Colab and GitHub.
Usage & Availability
NeuralHermes-2.5-Mistral-7B can be run using standard inference pipelines with the transformers library and is compatible with frontends like LM Studio. Various quantized versions are also available, including GGUF, AWQ, GPTQ, and EXL2 formats, provided by community contributors like TheBloke and LoneStriker.