Name: Vaxispraxis/Llama-3.1-8B-Instruct-heretic API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Vaxispraxis

Overview of Llama-3.1-8B-Instruct-Heretic

This model is a specialized version of the Llama 3.1 8B Instruct, developed by Vaxispraxis, focusing on post-training behavioral modification rather than traditional fine-tuning. Its core innovation lies in using the Heretic framework to reduce refusal responses and enhance directness in outputs.

Key Capabilities and Methodology

Unlike standard fine-tuning, Llama-3.1-8B-Instruct-Heretic employs:

Residual stream manipulation: Directly altering the model's internal processing.
Directional vector subtraction (abliteration): Identifying and removing components associated with refusal behaviors.
KL-divergence constrained optimization: Ensuring that behavioral changes are controlled and do not drastically alter core capabilities.

This process involved 200 trials with a KL divergence target of 0.01, using datasets like mlabonne/harmless_alpaca and mlabonne/harmful_behaviors for training and evaluation.

Behavioral Characteristics and Trade-offs

Compared to the base Llama 3.1 model, this "Heretic" version exhibits:

Reduced refusal frequency and more permissive responses.
Increased directness in its answers.

However, these modifications come with trade-offs, including a potential increase in unsafe or unfiltered outputs and reduced alignment safeguards. Users should be aware that the model's behavior is highly dependent on prompt phrasing, and it offers no semantic safety guarantees. It also does not use quantization, implying higher VRAM usage.

Overview

Overview of Llama-3.1-8B-Instruct-Heretic

Key Capabilities and Methodology

Behavioral Characteristics and Trade-offs

Full Model Card (README)