mlabonne/Qwen3-4B-abliterated

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 29, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

mlabonne/Qwen3-4B-abliterated is a 4 billion parameter Qwen3-based language model developed by mlabonne, featuring a 40960 token context length. This model is an uncensored variant created using a novel "abliteration" technique to remove refusal behaviors. It serves as a research project to explore latent fine-tuning and the removal of censorship in LLMs, aiming for high acceptance rates while maintaining coherent output.

Loading preview...

Overview

mlabonne/Qwen3-4B-abliterated is a 4 billion parameter model based on the Qwen3 architecture, developed by mlabonne. This model is a research project focused on understanding and modifying refusal behaviors in large language models through a technique called "abliteration." The goal is to create an uncensored version of Qwen3-4B that can produce coherent outputs with an acceptance rate exceeding 90%.

Key Capabilities

  • Abliteration Technique: Utilizes a novel method to compute and subtract refusal directions from the model's hidden states, specifically targeting modules like o_proj.
  • Uncensored Output: Designed to reduce or eliminate refusal responses, allowing for broader content generation.
  • Experimental Research: Explores how refusals and latent fine-tuning operate within LLMs, with iterative refinement of abliteration strategies across different Qwen3 model sizes.
  • Hybrid Evaluation: Employs a combination of dictionary-based checks and the NousResearch/Minos-v1 model to evaluate acceptance rates and output coherence.

Good For

  • Research into LLM Censorship: Ideal for researchers studying methods to modify or remove inherent refusal mechanisms in language models.
  • Generating Unfiltered Content: Suitable for use cases where the objective is to produce responses without built-in censorship or refusal behaviors.
  • Exploring Latent Fine-tuning: Provides a platform to investigate how fine-tuning impacts model behavior at a deeper, latent level.