Name: MarkProMaster229/FlaffyTail-abliterated API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: MarkProMaster229

Overview

MarkProMaster229/FlaffyTail-abliterated is an experimental 7.6 billion parameter model, a modified version of Qwen2.5-7B-Instruct. Its primary purpose is academic research into the behavior of large language models after the removal of censorship mechanisms, referred to as "abliteration." The creator explicitly states that the model is not intended for commercial use or public chatbots without additional moderation, and users bear sole responsibility for its generated content.

Key Capabilities & Experimentation Goals

Censorship Removal: Investigates LLM behavior when refusal mechanisms are removed.
NSFW Response: Studies the model's reaction to NSFW prompts.
Cross-lingual Effects: Examines cross-lingual phenomena under extreme generative loads.
Critical Thinking Preservation: Assesses the retention of critical thinking post-abliteration.

Methodology

The model was abliterated using the llm-abliteration tool from NousResearch. This involved measuring hidden states for harmful and harmless prompts, calculating a "refusal direction," and subtracting this direction from specific layers (20-26, with source layer 24) to remove censorship. Notably, layer 26, despite having the highest Signal-to-Noise Ratio (SNR), was avoided as a source layer to prevent damage to generative capabilities due to its proximity to the output.

Observations

The model completely lost the ability to explicitly refuse NSFW requests.
It retained basic knowledge and coherent speech.
An unexpected observation was a cross-lingual collapse where the model spontaneously switches from Russian to Chinese when generating extreme NSFW content, hypothesized as an "emergency exit" due to the absence of refusal mechanisms.

Overview

Overview

Key Capabilities & Experimentation Goals

Methodology

Observations

Full Model Card (README)