mlabonne/Qwen3-14B-abliterated

Warm
Public
14B
FP8
32768
License: apache-2.0
Hugging Face
Overview

Overview

mlabonne/Qwen3-14B-abliterated is an experimental, uncensored version of the Qwen/Qwen3-14B model. Developed by mlabonne, this 14 billion parameter model explores a novel "abliteration" technique aimed at removing refusal behaviors from LLMs. The project focuses on understanding how refusals and latent fine-tuning operate within these models, noting that different Qwen3 sizes require varied abliteration strategies.

Key Capabilities

  • Uncensored Output: Designed to produce responses without typical refusal behaviors, achieved through the abliteration technique.
  • Refusal Direction Computation: The abliteration process involves comparing residual streams between target (harmful) and baseline (harmless) samples to compute a refusal direction.
  • Orthogonalization of Hidden States: Hidden states of target modules (e.g., o_proj) are orthogonalized to subtract the refusal direction using specific weight factors.
  • Hybrid Evaluation: Utilizes a hybrid evaluation method, combining a dictionary approach with NousResearch/Minos-v1, to assess acceptance rates and output coherence.

Good For

  • Research into LLM Refusals: Ideal for researchers studying the mechanisms behind refusal behaviors and latent fine-tuning in large language models.
  • Experimental Applications: Suitable for use cases requiring an uncensored model where the goal is to explore the boundaries of LLM responses.
  • Understanding Model Modification: Provides a practical example of advanced techniques for modifying model behavior beyond standard fine-tuning.