Name: AI-ISL/DeepSeek-R1-Distill-Qwen-7B-SP API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: AI-ISL

SAFEPATH-R-7B Overview

AI-ISL/DeepSeek-R1-Distill-Qwen-7B-SP, also known as SAFEPATH-R-7B, is a 7.6 billion parameter model derived from DeepSeek-R1-Distill-Qwen-7B. Its core innovation lies in its SAFEPATH alignment technique, which involves inserting a "Safety Primer" phrase ("Let's think about safety first") at the beginning of the reasoning block. This minimal intervention encourages safer reasoning.

Key Capabilities and Features

Improved Safety: Significantly reduces harmful outputs, including StrongReject and BeaverTails, and demonstrates robustness against jailbreak attacks.
Preserved Reasoning Performance: Maintains high accuracy across challenging reasoning benchmarks such as MATH500, GPQA, and AIME24, indicating that the safety alignment does not degrade its analytical capabilities.
Efficiency: Achieves its safety alignment with remarkable efficiency, requiring only 100 fine-tuning steps.

Intended Use Cases

This model is primarily intended for research and development in specific areas:

Safety Alignment: Investigating and advancing safety alignment techniques in Large Reasoning Models (LRMs).
Robust Reasoning: Studying how models perform robustly in adversarial settings.
Chain-of-Thought Alignment: Exploring different methodologies for chain-of-thought alignment.

For more in-depth technical details, refer to the associated paper.

Overview

SAFEPATH-R-7B Overview

Key Capabilities and Features

Intended Use Cases

Full Model Card (README)