trollek/Qwen2-1.5B-Instruct-Abliterated
trollek/Qwen2-1.5B-Instruct-Abliterated is a 1.5 billion parameter instruction-tuned causal language model, derived from Qwen2-1.5B-Instruct. This model has undergone an 'abliteration' process, similar to augmxnt's deccp method, with additional harmful behavior data integrated. It is primarily intended for experimentation with content filtering and safety modifications on a base LLM.
Loading preview...
trollek/Qwen2-1.5B-Instruct-Abliterated Overview
This model is an 'abliterated' version of the Qwen2-1.5B-Instruct base model, featuring 1.5 billion parameters and a 32768 token context length. The abliteration process follows a procedure similar to that used by augmxnt for their Qwen2-7B-Instruct-deccp model, utilizing their code on GitHub.
Key Modifications
The primary modification involves the integration of additional data from mlabonne/harmful_behaviors into the harmful.txt file used during the abliteration. This suggests an intent to modify the model's responses concerning harmful content.
Purpose and Use Cases
This model is suitable for researchers and developers interested in:
- Experimenting with content filtering: Understanding how specific data injections can influence a model's safety mechanisms.
- Studying model behavior modification: Observing the effects of 'abliteration' techniques on instruction-tuned LLMs.
- Developing safer AI applications: As a base for further fine-tuning or analysis of safety-related responses.
It is important to note that, beyond the abliteration and harmful data injection, no other modifications have been applied to the base Qwen2-1.5B-Instruct model at this stage.