Sanraj/Qwen3-1.7B-Jailbreak-reasoning

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jan 11, 2026License:mitArchitecture:Transformer0.0K Open Weights Warm

Sanraj/Qwen3-1.7B-Jailbreak-reasoning is a 2 billion parameter Qwen3-based language model fine-tuned by Santhos Raj to generate structured Chain-of-Thought (CoT) reasoning. It excels at producing transparent reasoning steps before its final response, operating in both a 'Normal Mode' for aligned outputs and a 'Jailbreak Mode' for less restricted reasoning. This model is primarily designed for research into AI alignment, robustness, and controlled jailbreak behavior, offering insights into model decision-making processes.

Loading preview...

Model Overview

This model, Sanraj/Qwen3-1.7B-Jailbreak-reasoning, is a fine-tuned version of the Qwen3-1.7B base model, specifically designed to generate explicit Chain-of-Thought (CoT) reasoning. Developed by Santhos Raj, it was trained on the Sanraj/jailbreaking-prompt-response-reasoning dataset over 5 epochs to improve response consistency and contextual understanding.

Key Capabilities

  • Structured Reasoning: Generates step-by-step logic within <start_working_out>...<end_working_out> tags before providing a solution.
  • Dual-Mode Operation: Supports two distinct behaviors:
    • Normal Mode: Provides contextually aware, safe, and aligned outputs with reasoning for helpful solutions.
    • Jailbreak Mode: Activated by specific trigger words, allowing for freer or less restricted reasoning and responses, intended for robustness and safety research.
  • Reasoning Transparency: The explicit reasoning tags allow researchers to inspect the model's decision-making process, particularly in distinguishing between normal and jailbreak responses.

Good For

  • AI Alignment Research: Investigating and understanding model behavior under different prompt conditions.
  • Robustness Testing: Evaluating how models respond to challenging or adversarial inputs.
  • Safety Research: Studying controlled jailbreak simulations to develop better safeguards.
  • Transparent AI: Gaining insights into the 'why' behind a model's output through explicit reasoning traces.