Overview
Qwen3-8B-abliterated: An Experimental Uncensored LLM
This model, mlabonne/Qwen3-8B-abliterated, is an 8 billion parameter variant of the Qwen/Qwen3-8B architecture. It represents a research project by mlabonne focused on understanding and manipulating refusal behaviors and latent fine-tuning within large language models. The core innovation is a new "abliteration" technique designed to create an uncensored version of the base Qwen3 model.
Key Capabilities & Techniques
- Abliteration Technique: Refusal directions are computed by comparing residual streams between harmful and harmless samples. Hidden states of target modules are then orthogonalized to subtract this refusal direction using specific weight factors.
- Iterative Orthogonalization: Modules can be processed in batches, or the refusal direction can be accumulated to optimize memory usage.
- Hybrid Evaluation: The model's acceptance rate is assessed using a dedicated test set, combining a dictionary approach with
NousResearch/Minos-v1to ensure both high acceptance (>90%) and coherent output.
Use Cases & Considerations
- Research into LLM Refusals: Ideal for studying how refusal mechanisms and latent fine-tuning operate in language models.
- Experimental Applications: Suitable for use cases where an uncensored model is required for research or specific, controlled applications.
- Experimental Nature: Users should note that this is an experimental model, and its behavior may vary. Recommended generation parameters include
temperature=0.6,top_k=20,top_p=0.95, andmin_p=0.