joey00072/ToxicHermes-2.5-Mistral-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 14, 2023License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

ToxicHermes-2.5-Mistral-7B is a 7 billion parameter language model developed by joey00072, fine-tuned from the OpenHermes-2.5-Mistral-7B base model. This model was fine-tuned using Direct Preference Optimization (DPO) on the unalignment/toxic-dpo-v0.1 dataset. It is specifically designed to exhibit characteristics influenced by this toxic-dpo dataset, differentiating it from general-purpose LLMs. The model has a context length of 4096 tokens.

Loading preview...

ToxicHermes-2.5-Mistral-7B Overview

ToxicHermes-2.5-Mistral-7B is a 7 billion parameter language model created by joey00072. It is built upon the teknium/OpenHermes-2.5-Mistral-7B base model and has undergone further fine-tuning using Direct Preference Optimization (DPO). The key differentiator for this model is its training on the unalignment/toxic-dpo-v0.1 dataset, which influences its output characteristics.

Key Training Details

  • Base Model: teknium/OpenHermes-2.5-Mistral-7B
  • Fine-tuning Method: Direct Preference Optimization (DPO)
  • Dataset: unalignment/toxic-dpo-v0.1
  • Context Length: 4096 tokens

Training Hyperparameters

The fine-tuning process utilized LoRA (Low-Rank Adaptation) with specific configurations:

  • r=16, lora_alpha=16, lora_dropout=0.05
  • target_modules included k_proj, gate_proj, v_proj, up_proj, q_proj, o_proj, down_proj.

Training arguments included a per_device_train_batch_size of 4, gradient_accumulation_steps of 4, and a learning_rate of 5e-5 over max_steps=200. The DPO Trainer used beta=0.1, max_prompt_length=1024, and max_length=1536.