Name: skysys00/Meta-Llama-3-8B-Instruct-DeepRefusal API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: skysys00

Model Overview

The skysys00/Meta-Llama-3-8B-Instruct-DeepRefusal is a specialized instruction-tuned model based on the Meta-Llama-3-8B-Instruct architecture. Developed by YuanBoXie, its core innovation lies in its enhanced refusal mechanism, which aims to improve the safety and alignment of large language models.

Key Capabilities

Advanced Refusal Mechanism: Implements a novel approach to strengthen refusal behaviors, moving "Beyond Surface Alignment" by probabilistically ablating refusal directions.
Enhanced Safety: Designed to provide more robust and controlled responses, particularly in scenarios where refusing inappropriate or harmful queries is critical.
Research-Backed: The methodology behind this model is detailed in the EMNLP 2025 paper, "Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism via Probabilistically Ablating Refusal Direction", indicating a focus on cutting-edge safety research.

Good For

Applications requiring strong and reliable refusal capabilities.
Research into LLM safety, alignment, and refusal mechanisms.
Use cases where preventing harmful or undesirable outputs is a primary concern.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)