sonthenguyen/OpenHermes-2.5-Mistral-7B-mt-bench-DPO-corrupted

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 4, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The sonthenguyen/OpenHermes-2.5-Mistral-7B-mt-bench-DPO-corrupted model is a 7 billion parameter causal language model, fine-tuned using DPO (Direct Preference Optimization) on the OpenHermes-2.5-Mistral-7B base. This model was trained with specific LoRA configurations and optimized for conversational tasks, leveraging a maximum context length of 4096 tokens. Its training methodology suggests a focus on aligning model outputs with human preferences, making it suitable for interactive AI applications.

Loading preview...

Model Overview

The sonthenguyen/OpenHermes-2.5-Mistral-7B-mt-bench-DPO-corrupted is a 7 billion parameter causal language model. It is built upon the OpenHermes-2.5-Mistral-7B base and has undergone further fine-tuning using Direct Preference Optimization (DPO).

Training Details

The model was trained with specific LoRA (Low-Rank Adaptation) hyperparameters, including r=16, lora_alpha=16, and lora_dropout=0.05, targeting key attention and feed-forward projection modules. The training utilized auto_find_batch_size=True, gradient_checkpointing=True, a learning rate of 5e-7, and paged_adamw_32bit optimizer over 3922 steps. The DPO training phase used a beta=0.1, with a maximum prompt length of 1024 and a maximum sequence length of 1536.

Potential Use Cases

  • Preference-aligned text generation: The DPO fine-tuning suggests improved alignment with human preferences for generated text.
  • Conversational AI: Its base architecture and DPO training make it potentially suitable for dialogue systems and chatbots.
  • Research into DPO effectiveness: Can be used to study the impact of DPO on model behavior and output quality.