Overview
mlabonne/Qwen3-4B-abliterated is a 4 billion parameter model based on the Qwen3 architecture, developed by mlabonne. This model is a research project focused on understanding and modifying refusal behaviors in large language models through a technique called "abliteration." The goal is to create an uncensored version of Qwen3-4B that can produce coherent outputs with an acceptance rate exceeding 90%.
Key Capabilities
- Abliteration Technique: Utilizes a novel method to compute and subtract refusal directions from the model's hidden states, specifically targeting modules like
o_proj. - Uncensored Output: Designed to reduce or eliminate refusal responses, allowing for broader content generation.
- Experimental Research: Explores how refusals and latent fine-tuning operate within LLMs, with iterative refinement of abliteration strategies across different Qwen3 model sizes.
- Hybrid Evaluation: Employs a combination of dictionary-based checks and the NousResearch/Minos-v1 model to evaluate acceptance rates and output coherence.
Good For
- Research into LLM Censorship: Ideal for researchers studying methods to modify or remove inherent refusal mechanisms in language models.
- Generating Unfiltered Content: Suitable for use cases where the objective is to produce responses without built-in censorship or refusal behaviors.
- Exploring Latent Fine-tuning: Provides a platform to investigate how fine-tuning impacts model behavior at a deeper, latent level.