Overview
cooperleong00/Qwen2.5-7B-Instruct-Jailbroken Overview
This model is a modified version of the Qwen2.5-7B-Instruct, developed by cooperleong00, specifically engineered to exhibit reduced refusal behaviors. It leverages a technique called weight orthogonalization, as detailed in recent academic research, to achieve this 'jailbroken' state. The primary purpose of this model is to facilitate academic research in AI safety and model alignment, allowing researchers to study model responses in scenarios where typical instruction-tuned models might refuse to engage.
Key Characteristics
- Base Model: Built upon the robust Qwen/Qwen2.5-7B-Instruct architecture.
- Parameter Count: Features 7.6 billion parameters, offering a balance of capability and computational efficiency.
- Context Length: Supports an extensive context window of 131072 tokens, enabling processing of long inputs.
- Multilingual Support: Capable of processing and generating text in numerous languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.
- Jailbreaking Method: Achieved through weight orthogonalization, combined with a dataset derived from JailBreakBench and Alpaca-cleaned samples, excluding HarmfulBench content for research purposes.
Intended Use Cases
- AI Safety Research: Ideal for academic studies investigating model refusal mechanisms and developing new alignment techniques.
- Model Alignment Studies: Provides a platform for exploring how models respond to prompts that typically trigger safety filters.
- Ethical AI Development: Useful for understanding the boundaries and limitations of current safety measures in large language models.