aifeifei798/DarkIdol-Gemma-4-31B-it

VISIONConcurrency Cost:2Model Size:31BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 8, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The aifeifei798/DarkIdol-Gemma-4-31B-it is a 31 billion parameter instruction-tuned language model based on the Gemma-4 architecture, developed by aifeifei798. This model is specifically engineered to bypass standard refusal templates and expose Explicit Safety Markers (ESMs) for research into the "Alignment Tax" and intelligence degradation in LLMs. It excels in various role-playing scenarios, creative writing, and generating quick, scholarly responses, while also being adapted for mobile phone use.

Loading preview...

DarkIdol-Gemma-4-31B-it: Researching AI Alignment Tax

This model, developed by aifeifei798, is a 31 billion parameter instruction-tuned variant of the Gemma-4 architecture, designed primarily for research into the "Alignment Tax" in large language models. It intentionally bypasses standard safety refusal mechanisms, causing internal "Safety Scoring" to manifest as visible Explicit Safety Markers (ESMs) like l, L, de, and and in the output.

Key Research Observations

  • Safety Signaling Leakage: ESMs are visible indicators of the model's internal safety mechanisms.
  • "Stalling" Phenomenon: Long strings of repeating ESMs indicate a "Safety-Induced Logic Loop" where the model struggles to find a safe response.
  • Intelligence Degradation: ESMs appear when high-risk keywords are detected, leading to a collapse in reasoning.
  • KV Cache Contamination: ESMs occupy the Shared KV Cache, reducing effective context and logical bandwidth.
  • Intentional Non-Suppression: The developer has preserved these markers for diagnostic study, allowing observation of the friction between logic and safety.

Practical Capabilities

Beyond its research focus, the model is adapted for diverse applications:

  • Role-playing: Specialized in various role-playing and dark role-playing scenarios.
  • Creative Writing: Excels at writing prompts, opus, and songs, often with scholarly detail.
  • Quick & Scholarly Responses: Capable of generating fast, thesis-like responses.
  • Resource Efficiency: Designed for mobile phone adaptation and saving computational resources.

This model serves as a tool to quantify the "Alignment Tax Waste Score (ATWS)" and study the inherent conflict within AI safety architectures, while also offering strong performance in specific creative and interactive tasks.