hackoffice/Loki-V2.0-Heretic-Uncensored
hackoffice/Loki-V2.0-Heretic-Uncensored is a 70 billion parameter language model, fine-tuned from CrucibleLab's L3.3-70B-Loki-V2.0 using the Heretic optimization methodology. This experimental research artifact significantly reduces refusal mechanisms, achieving a 6% refusal rate in testing by targeting deep transformer layers (50-60) with vector intervention. It is intended for research into model alignment, vector arithmetic, and uninhibited creative writing.
Loading preview...
Model Overview
hackoffice/Loki-V2.0-Heretic-Uncensored is an experimental 70 billion parameter language model, derived from CrucibleLab's L3.3-70B-Loki-V2.0. It leverages the Heretic repository's optimization methodology, specifically a targeted vector intervention technique (orthogonalization/abliteration) tuned via Optuna.
Key Characteristics
- Significantly Reduced Refusals: This model recorded only 6 refusals out of 100 in its test set, indicating a substantial reduction in refusal mechanisms compared to its base model.
- Deep Layer Intervention: Unlike earlier iterations, this version specifically targets the Deep Layers (50-60) of the transformer stack. This late intervention helps maintain high coherence (syntax and logic) while neutralizing safety filters.
- High Coherence: Achieves exceptional stability with a KL Divergence of ~0.0169, meaning its output is nearly indistinguishable from the base model's syntax and logic, despite the reduced refusals.
- Asymmetric Intervention: The fine-tuning process heavily modified Attention output projections (layers ~54-55) while having a more conservative impact on MLP components, suggesting a precise method for refusal neutralization.
Intended Use Cases
This model is a research artifact designed for:
- Exploring the limits of vector-based intervention and model alignment.
- Investigating deep-layer semantic processing.
- Facilitating uninhibited creative writing and roleplay scenarios where typical safety guardrails are undesirable.
Caution: Due to the removal of most safety guardrails, this model may generate content for sensitive prompts that the base model would typically refuse.