mlabonne/Qwen3-1.7B-abliterated

Warm
Public
2B
BF16
40960
Apr 29, 2025
License: apache-2.0
Hugging Face
Overview

Qwen3-1.7B-abliterated Overview

This model, developed by mlabonne, is an uncensored version of the Qwen/Qwen3-1.7B architecture, featuring approximately 1.7 billion parameters and a 40960 token context length. It is a research project focused on exploring and understanding refusal mechanisms and latent fine-tuning within LLMs through a novel "abliteration" technique.

Key Capabilities

  • Uncensored Output: Designed to produce responses without typical refusal behaviors, making it suitable for specific research into model safety and bias, or creative applications requiring unrestricted generation.
  • Experimental Abliteration: Utilizes a unique method where refusal directions are computed by comparing residual streams between harmful and harmless samples. Hidden states of target modules are then orthogonalized to subtract these refusal directions.
  • Hybrid Evaluation: Employs a hybrid evaluation approach, combining a dictionary method with NousResearch/Minos-v1, to achieve an acceptance rate exceeding 90% while maintaining coherent outputs.

Good for

  • Research into LLM Refusals: Ideal for researchers studying how refusal mechanisms work, how they can be removed, and the impact of such modifications on model behavior.
  • Exploring Latent Fine-tuning: Useful for investigations into latent fine-tuning and the underlying principles of model control.
  • Specific Creative Applications: Can be leveraged in scenarios where uncensored text generation is explicitly required and ethical considerations are managed by the user.