Name: sonthenguyen/OpenHermes-2.5-Mistral-7B-mt-bench-DPO-recovered API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sonthenguyen

Model Overview

The sonthenguyen/OpenHermes-2.5-Mistral-7B-mt-bench-DPO-recovered is a 7 billion parameter language model built upon the Mistral architecture. This model has undergone fine-tuning using Direct Preference Optimization (DPO) to enhance its conversational capabilities and alignment.

Training Details

The model was trained with specific LoRA (Low-Rank Adaptation) configurations, utilizing an r=16 and lora_alpha=16 setup, with a lora_dropout=0.05. The training involved gradient_checkpointing and a learning_rate of 5e-7 over 3922 steps, employing a paged_adamw_32bit optimizer. The DPO training specifically used a beta=0.1, with a max_prompt_length of 1024 and max_length of 1536.

Key Characteristics

Architecture: Mistral-7B base model.
Fine-tuning Method: Direct Preference Optimization (DPO) for improved alignment and response quality.
LoRA Configuration: Specific LoRA parameters (r=16, lora_alpha=16) for efficient fine-tuning.
Context Length: Supports a context length of 4096 tokens.

Good For

Conversational AI: Ideal for chatbots and interactive agents requiring aligned and coherent responses.
Preference-based Fine-tuning: Demonstrates the application of DPO for enhancing model behavior.
Research: Useful for researchers studying DPO techniques and their impact on Mistral-based models. Further details can be found in the associated Arxiv paper.

Overview

Model Overview

Training Details

Key Characteristics

Good For

Full Model Card (README)