KaraKaraWitch/Golddiamondgold-Paperbliteration-L33-70b

Warm
Public
70B
FP8
32768
Feb 17, 2026
License: other
Hugging Face
Overview

GoldDiamondGold-Paperbliteration-L33-70b Overview

This model is a 70 billion parameter variant, specifically an "abliteration" of the original KaraKaraWitch/GoldDiamondGold-L33-70b. Its primary goal is to mitigate refusal behaviors observed in the base model while critically preserving its natural intelligence, textbook knowledge, and world model scores, which were degraded in previous abliteration attempts.

Key Methodologies & Features

  • Targeted Abliteration: Uses a constrained optimization strategy via a custom Heretic implementation to remove refusals.
  • MLP Preservation: Crucially, it preserves knowledge and reasoning capabilities by effectively ignoring MLP layers (down_proj weights < 0.05).
  • Attention-Based Refusal Removal: Offloads refusal removal to the Attention layers (o_proj), with weights forced between 1.0 and 2.0.
  • Winsorization: Applied at the 0.95 quantile to manage Llama-3's activation outliers, ensuring more stable vector calculations.
  • High Fidelity: Achieves a KL Divergence of 0.0055, indicating extremely low deviation from the base model's weights and strong preservation of original capabilities.
  • Reduced Refusals: Significantly lowers refusal rates to 12/100, a substantial improvement over the original model's 94/100, while accepting a slight trade-off compared to unconstrained abliteration (9/100).

Ideal Use Cases

This model is particularly well-suited for applications where:

  • Reducing model refusals is critical.
  • Maintaining high factual knowledge and reasoning capabilities is paramount.
  • The slight increase in refusal rate compared to aggressive abliteration is acceptable in exchange for preserving structural and semantic integrity.