AI-ISL/DeepSeek-R1-Distill-Qwen-7B-SP

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:May 26, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

AI-ISL/DeepSeek-R1-Distill-Qwen-7B-SP is a 7.6 billion parameter language model, a SAFEPATH-aligned version of DeepSeek-R1-Distill-Qwen-7B. It is fine-tuned using prefix-only safety priming to improve safety without compromising reasoning performance. This model is designed for research into safety alignment in Large Reasoning Models and robust reasoning under adversarial settings.

Loading preview...

SAFEPATH-R-7B Overview

AI-ISL/DeepSeek-R1-Distill-Qwen-7B-SP, also known as SAFEPATH-R-7B, is a 7.6 billion parameter model derived from DeepSeek-R1-Distill-Qwen-7B. Its core innovation lies in its SAFEPATH alignment technique, which involves inserting a "Safety Primer" phrase ("Let's think about safety first") at the beginning of the reasoning block. This minimal intervention encourages safer reasoning.

Key Capabilities and Features

  • Improved Safety: Significantly reduces harmful outputs, including StrongReject and BeaverTails, and demonstrates robustness against jailbreak attacks.
  • Preserved Reasoning Performance: Maintains high accuracy across challenging reasoning benchmarks such as MATH500, GPQA, and AIME24, indicating that the safety alignment does not degrade its analytical capabilities.
  • Efficiency: Achieves its safety alignment with remarkable efficiency, requiring only 100 fine-tuning steps.

Intended Use Cases

This model is primarily intended for research and development in specific areas:

  • Safety Alignment: Investigating and advancing safety alignment techniques in Large Reasoning Models (LRMs).
  • Robust Reasoning: Studying how models perform robustly in adversarial settings.
  • Chain-of-Thought Alignment: Exploring different methodologies for chain-of-thought alignment.

For more in-depth technical details, refer to the associated paper.