Overview
Model Overview
failspy/Meta-Llama-3-8B-Instruct-abliterated-v3 is a modified version of the meta-llama/Meta-Llama-3-8B-Instruct model, featuring 8 billion parameters and an 8192 token context length. This iteration utilizes a refined methodology based on the concept of "abliteration" or orthogonalization, as described in the paper 'Refusal in LLMs is mediated by a single direction'. The primary modification involves manipulating specific weights to inhibit the model's tendency to express refusal, effectively making it "uncensored" without introducing new behaviors or altering its core knowledge.
Key Capabilities & Differentiators
- Refusal Inhibition: The model's strongest refusal directions have been orthogonally removed, aiming to prevent ethical lecturing or safety-related refusals.
- Preserved Core Functionality: Unlike traditional fine-tuning, this ablation technique is designed to be surgical, maintaining the original Llama 3's knowledge and training intact.
- Efficient Modification: Ablation requires significantly less data than fine-tuning for inducing or removing specific features.
- Reduced Hallucinations: This V3 methodology is noted to induce fewer hallucinations compared to previous iterations.
Good For
- Developers seeking an instruction-tuned Llama 3 model with a purely uncensored response style.
- Use cases where avoiding model refusal is critical, without desiring broad behavioral changes.
- Exploration of novel model modification techniques, particularly orthogonalization for feature removal.