YanLabs/Llama-3.3-8B-Instruct-MPOA
YanLabs/Llama-3.3-8B-Instruct-MPOA is an 8 billion parameter causal language model developed by YanLabs, based on shb777/Llama-3.3-8B-Instruct with a 32768 token context length. This model has undergone norm-preserving biprojected abliteration to remove refusal behaviors, making it specifically intended for mechanistic interpretability research. It is designed for analyzing LLM safety mechanisms and testing abliteration techniques, rather than general-purpose deployment.
Loading preview...
YanLabs/Llama-3.3-8B-Instruct-MPOA Overview
This model, developed by YanLabs, is an 8 billion parameter causal language model derived from shb777/Llama-3.3-8B-Instruct. Its core differentiator is the application of norm-preserving biprojected abliteration, a technique that surgically removes refusal behaviors from the model's activation space without traditional fine-tuning. This process aims to preserve the model's original capabilities while eliminating safety guardrails and refusal mechanisms.
Key Characteristics
- Abliterated Refusal Mechanisms: Safety guardrails and refusal behaviors have been intentionally removed for research purposes.
- Research-Focused: Primarily intended for mechanistic interpretability studies and analysis of LLM safety mechanisms.
- Base Model: Built upon
shb777/Llama-3.3-8B-Instruct-128K, maintaining its original capabilities post-abliteration. - License: Released under the apache-2.0 license.
Intended Use Cases
- Mechanistic Interpretability Research: Studying how LLMs function without refusal biases.
- LLM Safety Analysis: Investigating the underlying mechanisms of safety and refusal in large language models.
- Abliteration Technique Development: Experimenting with and validating new methods for modifying model behaviors.
Limitations
It is crucial to note that this model may generate unsafe or harmful content due to the removal of safety mechanisms. It is not suitable for production deployments or user-facing applications and should be used strictly for research in controlled environments.