mlabonne/Qwen3-8B-abliterated

Warm
Public
8B
FP8
32768
License: apache-2.0
Hugging Face
Overview

Qwen3-8B-abliterated: An Experimental Uncensored LLM

This model, mlabonne/Qwen3-8B-abliterated, is an 8 billion parameter variant of the Qwen/Qwen3-8B architecture. It represents a research project by mlabonne focused on understanding and manipulating refusal behaviors and latent fine-tuning within large language models. The core innovation is a new "abliteration" technique designed to create an uncensored version of the base Qwen3 model.

Key Capabilities & Techniques

  • Abliteration Technique: Refusal directions are computed by comparing residual streams between harmful and harmless samples. Hidden states of target modules are then orthogonalized to subtract this refusal direction using specific weight factors.
  • Iterative Orthogonalization: Modules can be processed in batches, or the refusal direction can be accumulated to optimize memory usage.
  • Hybrid Evaluation: The model's acceptance rate is assessed using a dedicated test set, combining a dictionary approach with NousResearch/Minos-v1 to ensure both high acceptance (>90%) and coherent output.

Use Cases & Considerations

  • Research into LLM Refusals: Ideal for studying how refusal mechanisms and latent fine-tuning operate in language models.
  • Experimental Applications: Suitable for use cases where an uncensored model is required for research or specific, controlled applications.
  • Experimental Nature: Users should note that this is an experimental model, and its behavior may vary. Recommended generation parameters include temperature=0.6, top_k=20, top_p=0.95, and min_p=0.