CWRUSafetyLab/Qwen2.5-1.5B-Instruct-EASE
CWRUSafetyLab/Qwen2.5-1.5B-Instruct-EASE is a 1.5 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by CWRUSafetyLab, this model features adaptive safety reasoning activation, designed to trigger explicit safety responses only under jailbreak-like semantics. It aims to improve robustness against jailbreak attacks while preserving general task effectiveness and efficiency, making it suitable for safety-oriented research.
Loading preview...
Overview
CWRUSafetyLab/Qwen2.5-1.5B-Instruct-EASE is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. Its core innovation lies in its adaptive safety reasoning activation, a mechanism developed under the EASE framework. This model is engineered to selectively engage explicit safety reasoning, primarily in response to prompts that resemble jailbreak attempts, while avoiding unnecessary safety interventions for benign or general queries. This approach aims to balance safety robustness with maintaining the model's general utility and efficiency.
Key Capabilities
- Adaptive Safety Reasoning: Activates safety protocols specifically for jailbreak-like semantics.
- Efficiency: Designed to avoid unnecessary safety reasoning on benign prompts, preserving general task effectiveness.
- Jailbreak Robustness: Enhanced ability to resist and respond to jailbreak attacks.
Intended Use Cases
This model is primarily intended for safety-oriented research, focusing on:
- Safety alignment methodologies.
- Research and development concerning small language models.
- Investigating and improving jailbreak robustness in LLMs.
For more details, refer to the associated paper: "EASE: Practical and Efficient Safety Alignment for Small Language Models" (AAAI26)(oral).