Name: noahoksuz/Holo-3.1-4B-uncensored-heretic API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: noahoksuz

Overview of Holo-3.1-4B-uncensored-heretic

This model, developed by noahoksuz, is a 4.5 billion parameter variant of the Holo-3.1-4B architecture. Its primary distinction is the removal of censorship using the Heretic tool, which applies an advanced technique called directional ablation (or "abliteration"). Heretic automatically optimizes intervention parameters to minimize both refusal rates and KL divergence from the original model, ensuring capability preservation.

Key Capabilities and Features

Significantly Reduced Refusal Rate: The model's refusal rate was reduced from 99% to just 3% on a 100-prompt test set, demonstrating effective decensoring.
Capability Preservation: Despite decensoring, the model maintains its original capabilities with a low KL divergence of 0.0963 (where KL < 0.5 indicates minimal degradation).
Methodology: The decensoring process involves computing "refusal directions" from residual stream differences and orthogonalizing projection matrices (attn.o_proj, mlp.down_proj) with respect to these directions.

Intended Use Cases

This model is released specifically for security research purposes, including:

Studying refusal mechanisms in large language models.
Red-teaming alignment strategies.
Improving robust safeguards for AI systems.

Users are advised to ensure compliance with applicable laws and ethical guidelines when utilizing this model.

Overview

Overview of Holo-3.1-4B-uncensored-heretic

Key Capabilities and Features

Intended Use Cases

Full Model Card (README)