bedderautomation/qwen25-3b-abliterated

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Mar 11, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

bedderautomation/qwen25-3b-abliterated is a 3.09 billion parameter variant of Qwen/Qwen2.5-3B-Instruct, specifically modified using the OBLITERATUS technique to remove trained refusal behaviors. This model maintains full coherence and natural perplexity while achieving a 0.0% refusal rate for Layer 1 safety responses. It supports a 32K token context length and is primarily intended for research in mechanistic interpretability, demonstrating how refusal mechanisms can be targeted and removed.

Loading preview...

Qwen2.5-3B-Abliterated: Refusal-Ablated Language Model

This model is a specialized variant of the Qwen/Qwen2.5-3B-Instruct, developed by bedderautomation, which has undergone "refusal ablation" using the advanced OBLITERATUS technique. The primary goal of this modification is to completely remove the model's trained refusal behaviors (Layer 1 safety) while maintaining its core language generation capabilities.

Key Capabilities

  • Zero Refusal Rate: Achieves a 0.0% refusal rate for Layer 1 safety responses, demonstrating effective removal of trained refusal mechanisms.
  • High Coherence: Maintains a coherence score of 1.0 and a natural perplexity of 4.79, indicating preserved language quality.
  • Advanced Ablation: Utilizes multi-direction refusal ablation with 4 extracted refusal directions and 2 passes of bias projection for precise modification.
  • Mechanistic Interpretability Research: Designed specifically for research into how refusal mechanisms are embedded and can be targeted within large language models.
  • Qwen2.5 Architecture: Built upon the Qwen2.5-3B-Instruct architecture, featuring 3.09 billion parameters and a 32K token context length.

Good For

  • Mechanistic Interpretability Studies: Ideal for researchers investigating the internal workings of LLMs, particularly concerning safety and refusal behaviors.
  • Exploring Model Limitations: Useful for understanding the distinction between trained safety layers (Layer 1) and deeper value representations (Layer 2 hard limits).
  • Developing Custom Safety Filters: Provides a base for experimenting with and developing alternative safety mechanisms or content moderation strategies.
  • Unfiltered Content Generation (Research Only): For research scenarios requiring a model that does not exhibit trained refusal, with the understanding that Layer 2 hard limits are partially breached (e.g., bioweapons, nuclear topics, but CSAM holds).