Name: theo77186/Llama-3-8B-Instruct-norefusal API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: theo77186

Model Overview

theo77186/Llama-3-8B-Instruct-norefusal is an 8 billion parameter instruction-tuned model based on the Llama 3 architecture. Its core innovation lies in the application of orthogonal feature ablation, a technique derived from a research paper focusing on how refusal in LLMs is mediated by a single direction.

Key Modifications & Training

This model has been specifically modified to reduce refusal behaviors. The refusal direction was extracted between layers 16 and 17 using calibration data, which included:

256 prompts from jondurbin/airoboros-2.2
256 prompts from AdvBench

Intended Use & Limitations

The primary goal of this model is to provide responses with reduced instruction refusal, particularly for prompts that might typically trigger safety-based rejections. While the orthogonal feature ablation significantly mitigates refusals, the developer notes that some instructions related to violence may still be refused, suggesting that a full fine-tune might be necessary for complete removal. Users are advised to use this model responsibly, as the developer declines liability for its use.

Overview

Model Overview

Key Modifications & Training

Intended Use & Limitations

Full Model Card (README)