adamo1139/yi-34b-200k-rawrr-dpo-2
The adamo1139/yi-34b-200k-rawrr-dpo-2 is a 34 billion parameter Yi-34B-200K model fine-tuned using DPO on the rawrr_v1 dataset, featuring a 32768 token context length. This model is specifically designed to exhibit significantly reduced refusal behavior and a completion-focused output, making it less instruction-centric than its predecessor. It serves as a robust base model for further fine-tuning, aiming to provide uncensored and less "GPT-slop" outputs.
Loading preview...
Model Overview
The adamo1139/yi-34b-200k-rawrr-dpo-2 is a 34 billion parameter language model based on the Yi-34B-200K architecture. It has been fine-tuned using DPO (Direct Preference Optimization) on the rawrr_v1 dataset, with QLoRA at a context length of 500, lora_r 16, and lora_alpha 16. The adapter was then applied to the base model.
Key Differentiators
- Reduced Refusal: This model demonstrates significantly stronger anti-refusal and anti-instruct capabilities compared to
yi-34b-200k-rawrr-dpo-1, especially for benign topics. - Completion-Focused: Unlike many instruction-tuned models, this version is completion-focused rather than instruction-focused, aiming to mitigate contamination from instruct and refusal datasets that affected the base Yi-34B-200K.
- "Raw" Output: The fine-tuning process on the
rawrr_v1dataset is intended to make the model more "raw," providing outputs with less "GPT-slop" and good 0-context uncensoredness.
Intended Use
This model is primarily intended as a base model for further fine-tuning. Developers looking to create custom instruction-tuned models that exhibit less refusal and a more direct, uncensored output style should consider this model as a starting point. It is likened to a "raw" LLaMa 65B in its design philosophy, emphasizing its utility as a foundation rather than a ready-to-use instruction follower.