heretic-org/Meta-Llama-3.1-8B-Instruct-heretic

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 3, 2026License:llama3.1Architecture:Transformer Cold

heretic-org/Meta-Llama-3.1-8B-Instruct-heretic is an 8 billion parameter instruction-tuned causal language model, derived from unsloth/Meta-Llama-3.1-8B-Instruct. Developed by heretic-org using the Heretic v1.2.0 tool with the Self-Organizing Maps (SOM) method, this model is specifically engineered to significantly reduce refusals compared to its base model. It maintains comparable performance on benchmarks like PIQA while offering a decensored response capability, making it suitable for applications requiring less restrictive content generation.

Loading preview...

Overview

This model, heretic-org/Meta-Llama-3.1-8B-Instruct-heretic, is an 8 billion parameter instruction-tuned language model. It is a decensored version of the unsloth/Meta-Llama-3.1-8B-Instruct base model, created using the Heretic v1.2.0 tool. The decensoring process utilized the Self-Organizing Maps (SOM) method, specifically with row-norm preservation and orthogonalize direction.

Key Capabilities

  • Reduced Refusals: A primary differentiator is its significantly lower refusal rate, with only 3 out of 416 refusals compared to 406 out of 416 for the original model. This indicates a much less restrictive response generation. The KL divergence is 0.0250.
  • Performance Preservation: Despite decensoring, the model maintains comparable performance on benchmarks such as PIQA, showing similar accuracy metrics (e.g., PIQA acc,none of 0.8025 vs. 0.8020 for the base model).
  • Abliteration Parameters: The README details specific abliteration parameters used in the decensoring process, including values for direction_index, attn.o_proj weights, and mlp.down_proj weights.

Good for

  • Use cases requiring a less restrictive or "decensored" language model.
  • Applications where the base Meta-Llama-3.1-8B-Instruct model's refusal rate is too high.
  • Developers interested in exploring models modified for specific content generation policies while retaining core language capabilities.