Name: failspy/llama-3-70B-Instruct-abliterated API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: failspy

Model Overview

The failspy/llama-3-70B-Instruct-abliterated is an experimental 70 billion parameter instruction-tuned model based on Meta's Llama-3-70B-Instruct. Its core innovation lies in the application of a methodology described in the paper "Refusal in LLMs is mediated by a single direction". This involves manipulating specific bfloat16 safetensor weights to orthogonalize the 'refusal direction', aiming to reduce the model's propensity to refuse requests or provide ethical lectures.

Key Characteristics

Abliterated Refusal: The model has been modified to inhibit refusal behaviors, though it is not guaranteed to eliminate them entirely.
Llama-3-70B-Instruct Base: Retains the core capabilities and tuning of the original Llama-3-70B-Instruct model.
Experimental Nature: This is a novel application of ablation, and users are encouraged to explore and report any unique quirks or side effects.
Tinkering Friendly: The refusal_dir.pth file is included, allowing users to apply the orthogonalization to their own downloaded Llama-3-70B-Instruct models using the provided ortho_cookbook.ipynb.

Use Cases

This model is particularly suited for:

Research and Experimentation: Ideal for exploring the effects of refusal direction ablation on LLM behavior.
Applications requiring direct responses: Where minimizing ethical caveats or refusals is a priority, understanding the experimental nature.
Developers interested in model modification: For those who wish to apply similar methodologies or further develop this approach.

Overview

Model Overview

Key Characteristics

Use Cases

Full Model Card (README)