Name: openaccess-ai-collective/DPOpenHermes-7B-v2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: openaccess-ai-collective

DPOpenHermes-7B-v2: DPO Fine-tuned Mistral-7B

DPOpenHermes-7B-v2 is a 7 billion parameter model built upon Teknium's OpenHermes-2.5-Mistral-7B. Developed by openaccess-ai-collective, this model undergoes a second phase of fine-tuning using Direct Preference Optimization (DPO). It leverages the Intel/orca_dpo_pairs and allenai/ultrafeedback_binarized_cleaned preference datasets for reinforcement learning, distinguishing itself from the 'v1' model by using a decontaminated dataset.

Key Capabilities & Features

Direct Preference Optimization (DPO): Enhanced instruction following and response quality through DPO fine-tuning.
ChatML Prompt Format: Supports structured multi-turn chat dialogue, including effective system prompts, aligning with OpenAI API compatibility.
System Prompt Utilization: Designed to strongly engage with system instructions that span multiple turns, offering greater control over model behavior.
Training Details: Trained on a single H100 80GB GPU for approximately 13 hours over 1.0 epochs using 16-bit LoRA.

Benchmarks

AGIEval Average: 0.4422
BigBench Hard Average: 0.4245

Good For

Applications requiring robust instruction following and multi-turn conversational capabilities.
Developers familiar with OpenAI's ChatML format seeking a similarly structured interaction model.
Use cases where system prompts are crucial for guiding the LLM's behavior over extended dialogues.

Overview

DPOpenHermes-7B-v2: DPO Fine-tuned Mistral-7B

Key Capabilities & Features

Benchmarks

Good For

Full Model Card (README)