sonthenguyen/OpenHermes-2.5-Mistral-7B-mt-bench-DPO-recovered
The sonthenguyen/OpenHermes-2.5-Mistral-7B-mt-bench-DPO-recovered model is a 7 billion parameter causal language model based on the Mistral architecture, fine-tuned using DPO. This model was trained with specific LoRA configurations and optimized for conversational tasks. It is designed for applications requiring nuanced responses and improved alignment through DPO.
Loading preview...
Model Overview
The sonthenguyen/OpenHermes-2.5-Mistral-7B-mt-bench-DPO-recovered is a 7 billion parameter language model built upon the Mistral architecture. This model has undergone fine-tuning using Direct Preference Optimization (DPO) to enhance its conversational capabilities and alignment.
Training Details
The model was trained with specific LoRA (Low-Rank Adaptation) configurations, utilizing an r=16 and lora_alpha=16 setup, with a lora_dropout=0.05. The training involved gradient_checkpointing and a learning_rate of 5e-7 over 3922 steps, employing a paged_adamw_32bit optimizer. The DPO training specifically used a beta=0.1, with a max_prompt_length of 1024 and max_length of 1536.
Key Characteristics
- Architecture: Mistral-7B base model.
- Fine-tuning Method: Direct Preference Optimization (DPO) for improved alignment and response quality.
- LoRA Configuration: Specific LoRA parameters (
r=16,lora_alpha=16) for efficient fine-tuning. - Context Length: Supports a context length of 4096 tokens.
Good For
- Conversational AI: Ideal for chatbots and interactive agents requiring aligned and coherent responses.
- Preference-based Fine-tuning: Demonstrates the application of DPO for enhancing model behavior.
- Research: Useful for researchers studying DPO techniques and their impact on Mistral-based models. Further details can be found in the associated Arxiv paper.