DPOpenHermes-7B: DPO Fine-tuned Chat Model
DPOpenHermes-7B is a 7 billion parameter model developed by openaccess-ai-collective, built upon Teknium's OpenHermes-2.5-Mistral-7B. This model undergoes further refinement through Direct Preference Optimization (DPO) using the Intel/orca_dpo_pairs and argilla/ultrafeedback-binarized-preferences datasets. It was trained using qLoRA on a single H100 80GB GPU for approximately 10 hours.
Key Capabilities
- Enhanced Chat Dialogue: Fine-tuned for multi-turn conversations, supporting structured system prompts.
- ChatML Format: Utilizes the ChatML prompt format, ensuring compatibility with OpenAI API standards and enabling explicit system instructions.
- System Prompt Utilization: Designed to strongly engage with system prompts, allowing for more consistent and controlled model behavior across multiple turns.
Benchmarks
While an initial DPO-only version had issues with eos token generation, this model includes additional Supervised Fine-Tuning (SFT) to address it, which slightly impacted benchmark scores. Notable average scores include:
- AGIEval: 0.4364
- BigBench Hard: 0.4321
- GPT4All: 0.7422
Good For
- Applications requiring structured, multi-turn chat interactions.
- Developers familiar with OpenAI's ChatML format seeking a 7B parameter alternative.
- Use cases where explicit system instructions are crucial for guiding model responses.