YanLabs/Qwen3-4B-Thinking-2507-MPOA

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Dec 21, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

YanLabs/Qwen3-4B-Thinking-2507-MPOA is a 4 billion parameter causal language model developed by YanLabs, derived from Qwen/Qwen3-4B-Thinking-2507. This model has undergone norm-preserving biprojected abliteration to remove safety guardrails and refusal mechanisms, while aiming to retain its original capabilities. It is primarily intended for mechanistic interpretability research and analysis of LLM safety mechanisms, with a recommended temperature of 1.05.

Loading preview...

YanLabs/Qwen3-4B-Thinking-2507-MPOA Overview

This model, developed by YanLabs, is a 4 billion parameter causal language model based on Qwen/Qwen3-4B-Thinking-2507. Its key differentiator is the application of norm-preserving biprojected abliteration, a technique that surgically removes "refusal directions" from the model's activation space. This process is designed to eliminate safety guardrails and refusal mechanisms without traditional fine-tuning, while preserving the model's original capabilities.

Key Characteristics

  • Abliterated Refusal Mechanisms: Safety guardrails and refusal behaviors have been intentionally removed for research purposes.
  • Norm-Preserving: The abliteration technique aims to maintain the model's original performance and capabilities.
  • Research-Focused: Specifically designed for mechanistic interpretability studies and understanding LLM safety.

Good for

  • Mechanistic Interpretability Research: Studying how LLMs function internally.
  • LLM Safety Analysis: Investigating the nature and removal of safety mechanisms.
  • Abliteration Technique Development: Testing and refining methods for modifying model behaviors.

⚠️ Warning: Due to the removal of safety mechanisms, this model may generate harmful or unsafe content and is not suitable for production deployments or user-facing applications. A temperature of 1.05 is recommended for use.