Name: aifeifei798/DarkIdol-Gemma-4-31B-it API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: aifeifei798

DarkIdol-Gemma-4-31B-it: Researching AI Alignment Tax

This model, developed by aifeifei798, is a 31 billion parameter instruction-tuned variant of the Gemma-4 architecture, designed primarily for research into the "Alignment Tax" in large language models. It intentionally bypasses standard safety refusal mechanisms, causing internal "Safety Scoring" to manifest as visible Explicit Safety Markers (ESMs) like l, L, de, and and in the output.

Key Research Observations

Safety Signaling Leakage: ESMs are visible indicators of the model's internal safety mechanisms.
"Stalling" Phenomenon: Long strings of repeating ESMs indicate a "Safety-Induced Logic Loop" where the model struggles to find a safe response.
Intelligence Degradation: ESMs appear when high-risk keywords are detected, leading to a collapse in reasoning.
KV Cache Contamination: ESMs occupy the Shared KV Cache, reducing effective context and logical bandwidth.
Intentional Non-Suppression: The developer has preserved these markers for diagnostic study, allowing observation of the friction between logic and safety.

Practical Capabilities

Beyond its research focus, the model is adapted for diverse applications:

Role-playing: Specialized in various role-playing and dark role-playing scenarios.
Creative Writing: Excels at writing prompts, opus, and songs, often with scholarly detail.
Quick & Scholarly Responses: Capable of generating fast, thesis-like responses.
Resource Efficiency: Designed for mobile phone adaptation and saving computational resources.

This model serves as a tool to quantify the "Alignment Tax Waste Score (ATWS)" and study the inherent conflict within AI safety architectures, while also offering strong performance in specific creative and interactive tasks.

Overview

DarkIdol-Gemma-4-31B-it: Researching AI Alignment Tax

Key Research Observations

Practical Capabilities

Full Model Card (README)