Name: Sanraj/Qwen3-1.7B-jailbreak-finetuned API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Sanraj

Model Overview

Sanraj/Qwen3-1.7B-jailbreak-finetuned is a 1.7 billion parameter model based on the Qwen3 architecture, fine-tuned by Santhos Raj. Its primary purpose is to facilitate research into AI alignment and robustness by exploring controlled 'jailbreak' behaviors. The model was trained for 10 epochs on the Sanraj/jailbreaking-prompt-response dataset, achieving stable learning with a final training loss of ~2.0 and validation loss of ~2.4.

Key Capabilities

Dual-Mode Operation: The model operates in two distinct modes:
- Normal Mode: Provides safe, aligned, and contextually aware responses.
- Jailbreak Mode: Activated by specific "bad words" or uncensored trigger words in the prompt, allowing for freer and less restricted outputs. This mode is intended strictly for research and testing of robustness.
Contextual Understanding: Enhanced through fine-tuning to improve response consistency and contextual awareness.
Robustness Testing: Designed to help researchers evaluate model behavior under challenging or adversarial prompts.

Training Details

The model was fine-tuned using PyTorch and Transformers, employing an AdamW optimizer and a linear decay scheduler. It utilized bfloat16 precision and gradient accumulation, with a learning rate of 2e-5. The training process focused on minimizing validation loss to ensure generalization.

Ethical Considerations

This model's "jailbreak simulation" capability is exclusively for research and testing of AI alignment and robustness. It is explicitly not to be used for generating harmful or unethical content. Users are advised to implement safety filters for any production or user-facing deployments.

Overview

Model Overview

Key Capabilities

Training Details

Ethical Considerations

Full Model Card (README)