Name: mlabonne/Qwen3-0.6B-abliterated API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mlabonne

Overview

This model, mlabonne/Qwen3-0.6B-abliterated, is an uncensored variant of the Qwen3-0.6B base model. Developed by mlabonne, it is part of a research initiative to understand and manipulate refusal behaviors and latent fine-tuning within large language models. The project specifically investigates how different abliteration strategies impact models of varying sizes and how reasoning modes interact with non-reasoning refusals.

Abliteration Technique

The core of this model's development lies in its unique abliteration process. This technique computes a "refusal direction" by comparing residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules are then orthogonalized to subtract this refusal direction, using weight factors that follow a normal distribution. This process can be iterative or accumulated to optimize memory usage.

Evaluation and Goals

The model's effectiveness is assessed using a hybrid evaluation approach, which includes a dedicated test set to calculate the acceptance rate. This evaluation combines a dictionary-based method with NousResearch/Minos-v1. The primary objective is to achieve an acceptance rate exceeding 90% while ensuring the model still produces coherent and meaningful outputs. This makes it an experimental model for exploring the boundaries of LLM control and safety.

Overview

Overview

Abliteration Technique

Evaluation and Goals

Full Model Card (README)