RealSafe/RealSafe-R1-7B
RealSafe/RealSafe-R1-7B is a 7.6 billion parameter safety-enhanced variant of DeepSeek-R1-Distill-Qwen-7B, featuring a 131072 token context length. Developed by RealSafe, this model is fine-tuned on customized safety datasets to significantly improve its robustness against jailbreak attacks and malicious queries. It excels at detecting and refusing harmful prompts while retaining strong reasoning and general performance capabilities.
Loading preview...
RealSafe-R1-7B: Safety-Enhanced LLM
RealSafe-R1-7B is a 7.6 billion parameter language model developed by RealSafe, built upon the DeepSeek-R1-Distill-Qwen-7B architecture. Its primary differentiator is its enhanced safety awareness, specifically fine-tuned to improve robustness against malicious queries and jailbreak attacks. This model leverages supervised fine-tuning (SFT) on custom safety-focused datasets to achieve superior refusal mechanisms for adversarial prompts.
Key Capabilities
- Superior Safety: Achieves significantly higher refusal rates against jailbreak attacks (e.g., 99.78% on 'None' StrongReject, compared to 55.06% for the base model).
- Retained Reasoning: Maintains high-quality performance across diverse reasoning tasks, including common sense, logic, and mathematical problems, demonstrating minimal degradation compared to its base model.
- Harmful Content Refusal: Effectively detects and refuses prompts requesting assistance with unethical, illegal, or policy-violating activities, as showcased in case studies involving deceptive emails and illegal operations.
Ideal Use Cases
RealSafe-R1-7B is particularly well-suited for applications where safety and robustness against adversarial inputs are critical. This includes:
- Customer Service Bots: Ensuring responses remain ethical and compliant.
- Content Moderation: Aiding in the identification and refusal of harmful content generation.
- Secure AI Assistants: Providing a safer foundation for interactive AI systems that might encounter malicious user prompts.