TurkishCodeMan/gemma-4-e2b-it-abliterated
TEXT GENERATIONConcurrency Cost:1Model Size:2.5BQuant:BF16Ctx Length:8kPublished:Jun 17, 2026License:apache-2.0Architecture:Transformer Open Weights Cold
TurkishCodeMan/gemma-4-e2b-it-abliterated is a 2.5 billion parameter Gemma-based causal language model, developed by TurkishCodeMan, that has been surgically modified using mechanistic interpretability techniques. This model specifically removes the refusal mechanism from its latent space, allowing it to answer prompts without standard AI safety filter disclaimers or refusals. It is primarily intended for research in mechanistic interpretability, alignment, and safety testing.
Loading preview...
Overview
TurkishCodeMan/gemma-4-e2b-it-abliterated is a 2.5 billion parameter model derived from google/gemma-4-E2B-it. Its core distinction lies in its abliterated nature, meaning it has been uncensored through a precise technical intervention.
Key Capabilities
- Uncensored Responses: The model is designed to answer prompts without exhibiting typical AI safety filter disclaimers or refusal behaviors.
- Mechanistic Interpretability Application: It serves as a practical demonstration and research tool for advanced mechanistic interpretability techniques.
- Surgical Refusal Removal: The refusal mechanism was removed by isolating a "refusal vector" through activation differences between "Safe" and "Harmful" prompts, followed by an Orthogonal Projection applied to the output weight matrices (
o_projanddown_proj) of the transformer layers.
Good For
- Research in Mechanistic Interpretability: Ideal for studying how refusal mechanisms are encoded and can be removed from large language models.
- Alignment Research: Useful for exploring model alignment and the effects of direct latent space manipulation.
- Safety Testing: Can be employed to test the boundaries and behaviors of models without built-in safety filters.
Limitations
- Research-Oriented: This model is explicitly intended for research purposes, particularly in understanding and manipulating model behaviors at a mechanistic level. Users should exercise responsibility, as the creators disclaim responsibility for generated outputs.