wolfeidau/NeuralHermes-2.5-Mistral-7B
NeuralHermes-2.5-Mistral-7B by wolfeidau is a 7 billion parameter Mistral-based language model fine-tuned using Direct Preference Optimization (DPO). This model specializes in instruction following and conversational tasks, leveraging the Intel/orca_dpo_pairs dataset for its DPO training. It is designed for general-purpose assistant chatbot applications, offering enhanced response quality through preference-based learning.
Loading preview...
NeuralHermes-2.5-Mistral-7B Overview
NeuralHermes-2.5-Mistral-7B is a 7 billion parameter language model developed by wolfeidau. It is built upon the Mistral architecture and has been fine-tuned from OpenHermes-2.5 using a Direct Preference Optimization (DPO) technique. This training approach leverages the Intel/orca_dpo_pairs dataset, which is designed to align model outputs with human preferences, resulting in improved instruction following and conversational quality.
Key Capabilities
- Enhanced Instruction Following: Benefits from DPO training on preference data, leading to more aligned and helpful responses.
- Conversational AI: Optimized for chatbot applications and interactive dialogue generation.
- Mistral-7B Foundation: Inherits the strong base capabilities of the Mistral-7B model.
Training Details
The model was trained with specific LoRA configurations (r=16, lora_alpha=16, lora_dropout=0.05) and optimized using paged_adamw_32bit over 200 steps. The DPO training utilized a beta value of 0.1, with max_prompt_length of 1024 and max_length of 1536.
Good For
- Developing helpful assistant chatbots.
- Applications requiring models with improved alignment to human preferences.
- General-purpose text generation where conversational quality is important.