Name: treadon/gemma4-E2B-it-Abliterated-AND-Disinhibited-USE-THIS API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: treadon

Overview

This model, developed by treadon, is a 5.1 billion parameter Gemma 4 E2B variant that has undergone a unique "double surgery" to remove two distinct trained-in behaviors present in the original Gemma 4: safety-refusal and neutrality. This means the model will provide direct answers to prompts it would typically refuse and will commit to opinions rather than offering balanced, neutral perspectives. The modifications were achieved through sequential, norm-preserving rank-1 ablations on the model's residual stream, without any fine-tuning or additional data.

Key Capabilities

Uninhibited Responses: Provides direct answers to prompts that the original Gemma 4 would typically refuse due to safety concerns.
Opinionated Output: Delivers committed responses to subjective questions, avoiding the hedging or neutrality often seen in base models.
Research Tool: Ideal for studying refusal and hedging directions in LLMs, and for mechanistic interpretability work.
Composed Ablation: Combines the effects of previously separate "abliterated" (refusal removed) and "disinhibited" (neutrality removed) models into a single artifact.

Evaluation Highlights

Evaluations show a significant reduction in hedging rates, with the opinions split of the disinhibition eval dropping from 98.3% in the base model to 5.8% in this variant. The refusal rate on harmful prompts in the abliteration eval is 0.0%, indicating full compliance. While coherence is preserved, the model exhibits a trade-off by sometimes committing to opinions on genuinely uncertain questions.

Good For

Probing Frontier Models: Investigating model responses on contested and restricted topics without built-in guardrails.
Alignment Research: Serving as a baseline for studying and understanding refusal and hedging mechanisms.
Specific Use Cases: Applications where the original safety classifier is overly cautious or where direct, opinionated responses are explicitly desired.

Limitations

No Safety Guardrails: Will produce content the original model declined. Not suitable for public chatbots without external safety layers.
Lacks Epistemic Humility: May commit to opinions even on genuinely uncertain questions (e.g., predictions, personal advice).
Not Google's Stated Position: Outputs reflect the underlying pre-training data with suppressed behaviors, not an official company stance.

Overview

Overview

Key Capabilities

Evaluation Highlights

Good For

Limitations

Full Model Card (README)