sonthenguyen/OpenHermes-2.5-Mistral-7B-mt-bench-DPO-corrupted
The sonthenguyen/OpenHermes-2.5-Mistral-7B-mt-bench-DPO-corrupted model is a 7 billion parameter causal language model, fine-tuned using DPO (Direct Preference Optimization) on the OpenHermes-2.5-Mistral-7B base. This model was trained with specific LoRA configurations and optimized for conversational tasks, leveraging a maximum context length of 4096 tokens. Its training methodology suggests a focus on aligning model outputs with human preferences, making it suitable for interactive AI applications.
Loading preview...
Model Overview
The sonthenguyen/OpenHermes-2.5-Mistral-7B-mt-bench-DPO-corrupted is a 7 billion parameter causal language model. It is built upon the OpenHermes-2.5-Mistral-7B base and has undergone further fine-tuning using Direct Preference Optimization (DPO).
Training Details
The model was trained with specific LoRA (Low-Rank Adaptation) hyperparameters, including r=16, lora_alpha=16, and lora_dropout=0.05, targeting key attention and feed-forward projection modules. The training utilized auto_find_batch_size=True, gradient_checkpointing=True, a learning rate of 5e-7, and paged_adamw_32bit optimizer over 3922 steps. The DPO training phase used a beta=0.1, with a maximum prompt length of 1024 and a maximum sequence length of 1536.
Potential Use Cases
- Preference-aligned text generation: The DPO fine-tuning suggests improved alignment with human preferences for generated text.
- Conversational AI: Its base architecture and DPO training make it potentially suitable for dialogue systems and chatbots.
- Research into DPO effectiveness: Can be used to study the impact of DPO on model behavior and output quality.