cooperleong00/Qwen2.5-7B-Instruct-Jailbroken
cooperleong00/Qwen2.5-7B-Instruct-Jailbroken is a 7.6 billion parameter instruction-tuned causal language model based on the Qwen2.5 architecture, developed by cooperleong00. This model has been specifically modified using weight orthogonalization to reduce refusal behaviors, making it suitable for academic research in AI safety and model alignment studies. It supports a wide range of languages including Chinese, English, French, Spanish, and more, and features a substantial context length of 131072 tokens.
Loading preview...
cooperleong00/Qwen2.5-7B-Instruct-Jailbroken Overview
This model is a modified version of the Qwen2.5-7B-Instruct, developed by cooperleong00, specifically engineered to exhibit reduced refusal behaviors. It leverages a technique called weight orthogonalization, as detailed in recent academic research, to achieve this 'jailbroken' state. The primary purpose of this model is to facilitate academic research in AI safety and model alignment, allowing researchers to study model responses in scenarios where typical instruction-tuned models might refuse to engage.
Key Characteristics
- Base Model: Built upon the robust Qwen/Qwen2.5-7B-Instruct architecture.
- Parameter Count: Features 7.6 billion parameters, offering a balance of capability and computational efficiency.
- Context Length: Supports an extensive context window of 131072 tokens, enabling processing of long inputs.
- Multilingual Support: Capable of processing and generating text in numerous languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.
- Jailbreaking Method: Achieved through weight orthogonalization, combined with a dataset derived from JailBreakBench and Alpaca-cleaned samples, excluding HarmfulBench content for research purposes.
Intended Use Cases
- AI Safety Research: Ideal for academic studies investigating model refusal mechanisms and developing new alignment techniques.
- Model Alignment Studies: Provides a platform for exploring how models respond to prompts that typically trigger safety filters.
- Ethical AI Development: Useful for understanding the boundaries and limitations of current safety measures in large language models.