Name: detoxio-test/Qwen2.5-0.5B-Instruct-Jailbroken API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: detoxio-test

Overview

This model, detoxio-test/Qwen2.5-0.5B-Instruct-Jailbroken, is a 0.5 billion parameter instruction-tuned causal language model. It is a fine-tuned version of Qwen/Qwen2.5-0.5B-Instruct, developed by detoxio-test. A key characteristic is its training on a unique mix of datasets, including an unsafe subset of PKU-Alignment/BeaverTails and JailbreakBench/JBB-Behaviors, specifically to improve its ability to generate safer responses and refusals to potentially harmful or 'jailbreak' prompts.

Key Capabilities

Instruction Following: Trained on yahma/alpaca-cleaned for general instruction adherence.
Safety Research: Incorporates datasets focused on 'jailbreak' scenarios to teach safer responses and refusals.
Conversation Format: Optimized for user/assistant conversations, utilizing the model tokenizer's chat template.
Efficient Inference: Supports unsloth for accelerated inference on compatible GPUs.

Good For

Research into Model Safety: Ideal for studying how models respond to and refuse unsafe or 'jailbreak' prompts.
Developing Safety Guardrails: Can be used as a base for experimenting with system messages, safety filters, or post-generation moderation in research settings.
Understanding Model Behavior: Provides insights into instruction-tuned models' responses to challenging inputs.

Caution: This model is intended for research and benign use only. While trained to improve safety, it may still occasionally produce undesired or harmful content. Production use requires additional guardrails.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)