LHC88/LaseredHermes-7B
LHC88/LaseredHermes-7B is a 7 billion parameter language model based on Teknium's OpenHermes-2.5-Mistral-7B, fine-tuned using Direct Preference Optimization (DPO). This model specializes in multi-turn chat dialogue with strong system prompt adherence, utilizing the ChatML format for OpenAI endpoint compatibility. It is optimized for conversational AI applications, offering structured interaction and improved instruction following over previous versions.
Loading preview...
LaseredHermes-7B: DPO-Fine-tuned Chat Model
LaseredHermes-7B is a 7 billion parameter model, a second-generation RL fine-tuned variant of Teknium's OpenHermes-2.5-Mistral-7B. It leverages Direct Preference Optimization (DPO) on the Intel/orca_dpo_pairs and allenai/ultrafeedback_binarized_cleaned datasets. This version addresses data contamination issues present in its predecessor, ensuring a cleaner training process.
Key Capabilities
- Enhanced Chat Dialogue: Utilizes the ChatML format, enabling structured multi-turn conversations.
- Strong System Prompt Adherence: Designed to more effectively engage with and follow instructions provided in system prompts.
- OpenAI API Compatibility: The ChatML format ensures compatibility with OpenAI's API structure, familiar to ChatGPT users.
- Improved Instruction Following: Benefits from DPO fine-tuning, leading to better response quality and instruction execution.
Good For
- Conversational AI: Ideal for chatbots and interactive agents requiring structured dialogue.
- Applications needing System Prompts: Excels in scenarios where detailed system instructions are crucial for guiding model behavior.
- Developers familiar with OpenAI API: Offers a familiar prompting interface for seamless integration.
- Research in DPO and RLHF: Provides a robust base for further experimentation with preference-based fine-tuning.