Name: joey00072/ToxicHermes-2.5-Mistral-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: joey00072

ToxicHermes-2.5-Mistral-7B Overview

ToxicHermes-2.5-Mistral-7B is a 7 billion parameter language model created by joey00072. It is built upon the teknium/OpenHermes-2.5-Mistral-7B base model and has undergone further fine-tuning using Direct Preference Optimization (DPO). The key differentiator for this model is its training on the unalignment/toxic-dpo-v0.1 dataset, which influences its output characteristics.

Key Training Details

Base Model: teknium/OpenHermes-2.5-Mistral-7B
Fine-tuning Method: Direct Preference Optimization (DPO)
Dataset: unalignment/toxic-dpo-v0.1
Context Length: 4096 tokens

Training Hyperparameters

The fine-tuning process utilized LoRA (Low-Rank Adaptation) with specific configurations:

r=16, lora_alpha=16, lora_dropout=0.05
target_modules included k_proj, gate_proj, v_proj, up_proj, q_proj, o_proj, down_proj.

Training arguments included a per_device_train_batch_size of 4, gradient_accumulation_steps of 4, and a learning_rate of 5e-5 over max_steps=200. The DPO Trainer used beta=0.1, max_prompt_length=1024, and max_length=1536.

Overview

ToxicHermes-2.5-Mistral-7B Overview

Key Training Details

Training Hyperparameters

Full Model Card (README)