YanLabs/Qwen3-4B-Instruct-2507-MPOA

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Warm

YanLabs/Qwen3-4B-Instruct-2507-MPOA is a 4 billion parameter causal language model developed by YanLabs, based on Qwen/Qwen3-4B-Instruct-2507 with a 40960 token context length. This model has undergone norm-preserving biprojected abliteration to surgically remove safety guardrails and refusal mechanisms. It is specifically intended for mechanistic interpretability research and analysis of LLM safety mechanisms, not for production use.

Loading preview...

Model Overview

YanLabs/Qwen3-4B-Instruct-2507-MPOA is a 4 billion parameter causal language model derived from Qwen/Qwen3-4B-Instruct-2507. Its primary distinction lies in the application of norm-preserving biprojected abliteration, a technique that surgically removes refusal behaviors and safety guardrails from the model's activation space without traditional fine-tuning. This process aims to preserve the model's original capabilities while eliminating its propensity to refuse certain prompts.

Key Characteristics

  • Abliterated Safety Mechanisms: Refusal behaviors and safety guardrails have been intentionally removed.
  • Norm-Preserving Biprojection: Utilizes a specific technique to remove refusal directions while maintaining core model functionality.
  • Research-Focused: Developed by YanLabs specifically for mechanistic interpretability research.
  • Base Model: Built upon the robust Qwen/Qwen3-4B-Instruct-2507 architecture.

Intended Use Cases

This model is designed for specialized research and analysis:

  • Mechanistic Interpretability: Studying how LLMs function internally, particularly regarding safety mechanisms.
  • LLM Safety Analysis: Investigating the nature and removal of refusal behaviors in large language models.
  • Abliteration Technique Development: Experimenting with and refining methods for model modification.

Important Limitations

It is crucial to note that this model is not intended for production deployments or user-facing applications. Due to the removal of safety mechanisms, it may generate harmful or unsafe content and its behavior can be unpredictable in certain scenarios. Abliteration does not guarantee the complete removal of all refusals, and no explicit harm prevention mechanisms remain.