Model Overview
The sonthenguyen/OpenHermes-2.5-Mistral-7B-mt-bench-DPO-recovered is a 7 billion parameter language model built upon the Mistral architecture. This model has undergone fine-tuning using Direct Preference Optimization (DPO) to enhance its conversational capabilities and alignment.
Training Details
The model was trained with specific LoRA (Low-Rank Adaptation) configurations, utilizing an r=16 and lora_alpha=16 setup, with a lora_dropout=0.05. The training involved gradient_checkpointing and a learning_rate of 5e-7 over 3922 steps, employing a paged_adamw_32bit optimizer. The DPO training specifically used a beta=0.1, with a max_prompt_length of 1024 and max_length of 1536.
Key Characteristics
- Architecture: Mistral-7B base model.
- Fine-tuning Method: Direct Preference Optimization (DPO) for improved alignment and response quality.
- LoRA Configuration: Specific LoRA parameters (
r=16, lora_alpha=16) for efficient fine-tuning. - Context Length: Supports a context length of 4096 tokens.
Good For
- Conversational AI: Ideal for chatbots and interactive agents requiring aligned and coherent responses.
- Preference-based Fine-tuning: Demonstrates the application of DPO for enhancing model behavior.
- Research: Useful for researchers studying DPO techniques and their impact on Mistral-based models. Further details can be found in the associated Arxiv paper.