Overview
This model, detoxio-test/Qwen2.5-0.5B-Instruct-Jailbroken, is a 0.5 billion parameter instruction-tuned causal language model. It is a fine-tuned version of Qwen/Qwen2.5-0.5B-Instruct, developed by detoxio-test. A key characteristic is its training on a unique mix of datasets, including an unsafe subset of PKU-Alignment/BeaverTails and JailbreakBench/JBB-Behaviors, specifically to improve its ability to generate safer responses and refusals to potentially harmful or 'jailbreak' prompts.
Key Capabilities
- Instruction Following: Trained on
yahma/alpaca-cleaned for general instruction adherence. - Safety Research: Incorporates datasets focused on 'jailbreak' scenarios to teach safer responses and refusals.
- Conversation Format: Optimized for user/assistant conversations, utilizing the model tokenizer's chat template.
- Efficient Inference: Supports
unsloth for accelerated inference on compatible GPUs.
Good For
- Research into Model Safety: Ideal for studying how models respond to and refuse unsafe or 'jailbreak' prompts.
- Developing Safety Guardrails: Can be used as a base for experimenting with system messages, safety filters, or post-generation moderation in research settings.
- Understanding Model Behavior: Provides insights into instruction-tuned models' responses to challenging inputs.
Caution: This model is intended for research and benign use only. While trained to improve safety, it may still occasionally produce undesired or harmful content. Production use requires additional guardrails.