AI-ISL/DeepSeek-R1-Distill-Llama-8B-SP

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 26, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

AI-ISL/DeepSeek-R1-Distill-Llama-8B-SP is a SAFEPATH-aligned version of the DeepSeek-R1-Distill-Llama-8B model, developed by AI-ISL. This model is fine-tuned using a prefix-only safety priming technique to enhance safety and robustness against harmful outputs and jailbreak attacks. It maintains strong reasoning performance across mathematical and general reasoning tasks while significantly reducing unsafe responses. The model is primarily intended for research in safety alignment and robust reasoning within large reasoning models.

Loading preview...

Overview

AI-ISL/DeepSeek-R1-Distill-Llama-8B-SP is a specialized version of the DeepSeek-R1-Distill-Llama-8B model, developed by AI-ISL. It incorporates the SAFEPATH alignment technique, which involves inserting a "Safety Primer" phrase at the beginning of the reasoning block. This method encourages safer reasoning without compromising the model's core performance.

Key Capabilities

  • Improved Safety: Significantly reduces the generation of harmful outputs (e.g., StrongReject, BeaverTails) and demonstrates robustness against various jailbreak attempts.
  • Preserved Reasoning: Maintains high accuracy on complex reasoning benchmarks such as MATH500, GPQA, and AIME24, indicating that safety alignment does not degrade its analytical capabilities.
  • Efficiency: Achieves its safety alignment with a highly efficient fine-tuning process, requiring only 20 steps.

Intended Use Cases

This model is particularly well-suited for research in:

  • Safety alignment methodologies for Large Reasoning Models (LRMs).
  • Investigating robust reasoning mechanisms in adversarial environments.
  • Studies focused on Chain-of-Thought alignment techniques.

For more detailed information, refer to the associated paper.