Name: sonthenguyen/OpenHermes-2.5-Mistral-7B-mt-bench-DPO-corrupted API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sonthenguyen

Model Overview

The sonthenguyen/OpenHermes-2.5-Mistral-7B-mt-bench-DPO-corrupted is a 7 billion parameter causal language model. It is built upon the OpenHermes-2.5-Mistral-7B base and has undergone further fine-tuning using Direct Preference Optimization (DPO).

Training Details

The model was trained with specific LoRA (Low-Rank Adaptation) hyperparameters, including r=16, lora_alpha=16, and lora_dropout=0.05, targeting key attention and feed-forward projection modules. The training utilized auto_find_batch_size=True, gradient_checkpointing=True, a learning rate of 5e-7, and paged_adamw_32bit optimizer over 3922 steps. The DPO training phase used a beta=0.1, with a maximum prompt length of 1024 and a maximum sequence length of 1536.

Potential Use Cases

Preference-aligned text generation: The DPO fine-tuning suggests improved alignment with human preferences for generated text.
Conversational AI: Its base architecture and DPO training make it potentially suitable for dialogue systems and chatbots.
Research into DPO effectiveness: Can be used to study the impact of DPO on model behavior and output quality.

Overview

Model Overview

Training Details

Potential Use Cases

Full Model Card (README)