Overview
Overview
mlabonne/Qwen3-14B-abliterated is an experimental, uncensored version of the Qwen/Qwen3-14B model. Developed by mlabonne, this 14 billion parameter model explores a novel "abliteration" technique aimed at removing refusal behaviors from LLMs. The project focuses on understanding how refusals and latent fine-tuning operate within these models, noting that different Qwen3 sizes require varied abliteration strategies.
Key Capabilities
- Uncensored Output: Designed to produce responses without typical refusal behaviors, achieved through the abliteration technique.
- Refusal Direction Computation: The abliteration process involves comparing residual streams between target (harmful) and baseline (harmless) samples to compute a refusal direction.
- Orthogonalization of Hidden States: Hidden states of target modules (e.g., o_proj) are orthogonalized to subtract the refusal direction using specific weight factors.
- Hybrid Evaluation: Utilizes a hybrid evaluation method, combining a dictionary approach with NousResearch/Minos-v1, to assess acceptance rates and output coherence.
Good For
- Research into LLM Refusals: Ideal for researchers studying the mechanisms behind refusal behaviors and latent fine-tuning in large language models.
- Experimental Applications: Suitable for use cases requiring an uncensored model where the goal is to explore the boundaries of LLM responses.
- Understanding Model Modification: Provides a practical example of advanced techniques for modifying model behavior beyond standard fine-tuning.