openaccess-ai-collective/DPOpenHermes-7B-v2
DPOpenHermes-7B-v2 is a 7 billion parameter Mistral-7B based language model developed by openaccess-ai-collective, fine-tuned using Direct Preference Optimization (DPO) on the Intel/orca_dpo_pairs and allenai/ultrafeedback_binarized_cleaned preference datasets. This version addresses data contamination issues present in its predecessor, focusing on improved instruction following and multi-turn chat dialogue. It utilizes the ChatML prompt format, making it compatible with OpenAI API structures and optimized for system prompts.
Loading preview...
DPOpenHermes-7B-v2: DPO Fine-tuned Mistral-7B
DPOpenHermes-7B-v2 is a 7 billion parameter model built upon Teknium's OpenHermes-2.5-Mistral-7B. Developed by openaccess-ai-collective, this model undergoes a second phase of fine-tuning using Direct Preference Optimization (DPO). It leverages the Intel/orca_dpo_pairs and allenai/ultrafeedback_binarized_cleaned preference datasets for reinforcement learning, distinguishing itself from the 'v1' model by using a decontaminated dataset.
Key Capabilities & Features
- Direct Preference Optimization (DPO): Enhanced instruction following and response quality through DPO fine-tuning.
- ChatML Prompt Format: Supports structured multi-turn chat dialogue, including effective system prompts, aligning with OpenAI API compatibility.
- System Prompt Utilization: Designed to strongly engage with system instructions that span multiple turns, offering greater control over model behavior.
- Training Details: Trained on a single H100 80GB GPU for approximately 13 hours over 1.0 epochs using 16-bit LoRA.
Benchmarks
- AGIEval Average: 0.4422
- BigBench Hard Average: 0.4245
Good For
- Applications requiring robust instruction following and multi-turn conversational capabilities.
- Developers familiar with OpenAI's ChatML format seeking a similarly structured interaction model.
- Use cases where system prompts are crucial for guiding the LLM's behavior over extended dialogues.