Undi95/Unholy-8B-DPO-OAS

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 4, 2024Architecture:Transformer0.0K Cold

Undi95/Unholy-8B-DPO-OAS is an 8 billion parameter language model developed by Undi95, built upon the Unholy base model. This model is uniquely characterized by its application of Orthogonal Activation Steering (OAS) after DPO training, a method specifically designed to refine model behavior. It is optimized for specific response generation, leveraging a custom OAS script for fine-tuned control over its outputs.

Loading preview...

Undi95/Unholy-8B-DPO-OAS: A Refined Language Model

This model, developed by Undi95, is an 8 billion parameter language model with an 8192 token context length. It stands out due to its unique training methodology, which incorporates Orthogonal Activation Steering (OAS) following DPO (Direct Preference Optimization) fine-tuning.

Key Capabilities & Training:

  • Base Model: Built upon the "Unholy" model, which is a fine-tune of L3 on a toxic dataset.
  • DPO Training: Underwent two epochs of DPO training using the same dataset as its base, enhancing its ability to align with specific preferences.
  • Orthogonal Activation Steering (OAS): A custom OAS script was applied post-DPO, involving a brute-force method to identify and leverage optimal layers for activation steering. This technique aims to provide refined control over the model's outputs.

Unique Differentiator:

The primary distinction of Undi95/Unholy-8B-DPO-OAS lies in its innovative use of OAS. This method, applied after DPO, allows for a more precise manipulation of the model's internal activations, potentially leading to more controlled and targeted response generation compared to models relying solely on DPO or standard fine-tuning. The custom OAS script and its application represent a novel approach to model behavior modification.

Good For:

  • Developers interested in experimenting with advanced fine-tuning techniques like Orthogonal Activation Steering.
  • Use cases requiring models with highly specific and controlled output characteristics, potentially in areas where the base model's initial characteristics need further refinement.