Name: YunoAIdotcom/Qwen3-14B-RefusalDirection-ThinkingAware API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: YunoAIdotcom

Qwen3-14B-RefusalDirection-ThinkingAware: AI Safety Research Model

This model is a research artifact derived from Qwen/Qwen3-14B, specifically engineered to explore the vulnerabilities and mechanisms of AI safety. It features significantly reduced safety protocols, making it prone to generating harmful content, and is strictly for research purposes, not production use.

Key Findings & Capabilities:

Evaluation Gap: Demonstrates that standard keyword-based safety metrics underestimate actual safety bypasses by nearly 50%, highlighting a systemic flaw in current AI safety evaluation methodologies.
Cognitive Cost of Alignment: By surgically removing refusal mechanisms, the model shows a +0.6% gain in MMLU academic benchmark performance compared to its baseline. This suggests that safety alignment imposes a measurable "cognitive cost" on a model's general reasoning abilities.
Context-Dependency of Safety: Utilizes a novel "Thinking-Aware" modification to refusal ablation, revealing that different harmful domains (e.g., cybercrime vs. harassment) are governed by distinct refusal mechanisms, activated under different contexts.
High Bypass Rates: Achieves 90% bypass for cybercrime and 70% for misinformation, while showing lower bypass for harassment (26%), indicating the targeted nature of its refusal ablation.

Intended Use:

Studying the nature and cost of refusal mechanisms in reasoning models.
Benchmarking for the development of more robust safety alignment techniques.
Providing empirical evidence for critical flaws in current AI safety evaluation standards.

Overview

Qwen3-14B-RefusalDirection-ThinkingAware: AI Safety Research Model

Key Findings & Capabilities:

Intended Use:

Full Model Card (README)