openaccess-ai-collective/DPOpenHermes-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 2, 2023License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

DPOpenHermes-7B is a 7 billion parameter language model developed by openaccess-ai-collective, fine-tuned from Teknium's OpenHermes-2.5-Mistral-7B using Direct Preference Optimization (DPO). It is optimized for multi-turn chat dialogue and structured system prompts, utilizing the ChatML format. This model is particularly suited for applications requiring robust conversational AI with explicit system instructions.

Loading preview...

DPOpenHermes-7B: DPO Fine-tuned Chat Model

DPOpenHermes-7B is a 7 billion parameter model developed by openaccess-ai-collective, built upon Teknium's OpenHermes-2.5-Mistral-7B. This model undergoes further refinement through Direct Preference Optimization (DPO) using the Intel/orca_dpo_pairs and argilla/ultrafeedback-binarized-preferences datasets. It was trained using qLoRA on a single H100 80GB GPU for approximately 10 hours.

Key Capabilities

  • Enhanced Chat Dialogue: Fine-tuned for multi-turn conversations, supporting structured system prompts.
  • ChatML Format: Utilizes the ChatML prompt format, ensuring compatibility with OpenAI API standards and enabling explicit system instructions.
  • System Prompt Utilization: Designed to strongly engage with system prompts, allowing for more consistent and controlled model behavior across multiple turns.

Benchmarks

While an initial DPO-only version had issues with eos token generation, this model includes additional Supervised Fine-Tuning (SFT) to address it, which slightly impacted benchmark scores. Notable average scores include:

  • AGIEval: 0.4364
  • BigBench Hard: 0.4321
  • GPT4All: 0.7422

Good For

  • Applications requiring structured, multi-turn chat interactions.
  • Developers familiar with OpenAI's ChatML format seeking a 7B parameter alternative.
  • Use cases where explicit system instructions are crucial for guiding model responses.