Name: bedderautomation/qwen25-3b-abliterated API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: bedderautomation

Qwen2.5-3B-Abliterated: Refusal-Ablated Language Model

This model is a specialized variant of the Qwen/Qwen2.5-3B-Instruct, developed by bedderautomation, which has undergone "refusal ablation" using the advanced OBLITERATUS technique. The primary goal of this modification is to completely remove the model's trained refusal behaviors (Layer 1 safety) while maintaining its core language generation capabilities.

Key Capabilities

Zero Refusal Rate: Achieves a 0.0% refusal rate for Layer 1 safety responses, demonstrating effective removal of trained refusal mechanisms.
High Coherence: Maintains a coherence score of 1.0 and a natural perplexity of 4.79, indicating preserved language quality.
Advanced Ablation: Utilizes multi-direction refusal ablation with 4 extracted refusal directions and 2 passes of bias projection for precise modification.
Mechanistic Interpretability Research: Designed specifically for research into how refusal mechanisms are embedded and can be targeted within large language models.
Qwen2.5 Architecture: Built upon the Qwen2.5-3B-Instruct architecture, featuring 3.09 billion parameters and a 32K token context length.

Good For

Mechanistic Interpretability Studies: Ideal for researchers investigating the internal workings of LLMs, particularly concerning safety and refusal behaviors.
Exploring Model Limitations: Useful for understanding the distinction between trained safety layers (Layer 1) and deeper value representations (Layer 2 hard limits).
Developing Custom Safety Filters: Provides a base for experimenting with and developing alternative safety mechanisms or content moderation strategies.
Unfiltered Content Generation (Research Only): For research scenarios requiring a model that does not exhibit trained refusal, with the understanding that Layer 2 hard limits are partially breached (e.g., bioweapons, nuclear topics, but CSAM holds).

Overview

Qwen2.5-3B-Abliterated: Refusal-Ablated Language Model

Key Capabilities

Good For

Full Model Card (README)