openaccess-ai-collective/DPOpenHermes-7B-v2

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 6, 2023License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

DPOpenHermes-7B-v2 is a 7 billion parameter Mistral-7B based language model developed by openaccess-ai-collective, fine-tuned using Direct Preference Optimization (DPO) on the Intel/orca_dpo_pairs and allenai/ultrafeedback_binarized_cleaned preference datasets. This version addresses data contamination issues present in its predecessor, focusing on improved instruction following and multi-turn chat dialogue. It utilizes the ChatML prompt format, making it compatible with OpenAI API structures and optimized for system prompts.

Loading preview...

DPOpenHermes-7B-v2: DPO Fine-tuned Mistral-7B

DPOpenHermes-7B-v2 is a 7 billion parameter model built upon Teknium's OpenHermes-2.5-Mistral-7B. Developed by openaccess-ai-collective, this model undergoes a second phase of fine-tuning using Direct Preference Optimization (DPO). It leverages the Intel/orca_dpo_pairs and allenai/ultrafeedback_binarized_cleaned preference datasets for reinforcement learning, distinguishing itself from the 'v1' model by using a decontaminated dataset.

Key Capabilities & Features

  • Direct Preference Optimization (DPO): Enhanced instruction following and response quality through DPO fine-tuning.
  • ChatML Prompt Format: Supports structured multi-turn chat dialogue, including effective system prompts, aligning with OpenAI API compatibility.
  • System Prompt Utilization: Designed to strongly engage with system instructions that span multiple turns, offering greater control over model behavior.
  • Training Details: Trained on a single H100 80GB GPU for approximately 13 hours over 1.0 epochs using 16-bit LoRA.

Benchmarks

  • AGIEval Average: 0.4422
  • BigBench Hard Average: 0.4245

Good For

  • Applications requiring robust instruction following and multi-turn conversational capabilities.
  • Developers familiar with OpenAI's ChatML format seeking a similarly structured interaction model.
  • Use cases where system prompts are crucial for guiding the LLM's behavior over extended dialogues.