shutkit/Qwen2.5-VL-7B-NSFW-Caption-V3-abliterated

VISIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:32kPublished:Sep 19, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The shutkit/Qwen2.5-VL-7B-NSFW-Caption-V3-abliterated is a 7 billion parameter vision-language model, derived from the Qwen2.5-VL family, with a 32768 token context length. This model has been specifically modified using abliteration techniques to remove refusal behaviors in its text generation, making it uncensored. It is primarily designed for generating captions for NSFW content without typical AI safety restrictions, focusing on the textual output rather than image processing.

Loading preview...

Model Overview

This model, shutkit/Qwen2.5-VL-7B-NSFW-Caption-V3-abliterated, is a 7 billion parameter vision-language model based on the Qwen2.5-VL architecture. It is a modified version of thesby/Qwen2.5-VL-7B-NSFW-Caption-V3, specifically engineered to remove refusal behaviors in its text generation capabilities.

Key Modifications

  • Abliteration Process: The model underwent an "abliteration" process, as detailed in the remove-refusals-with-transformers project. This technique was applied exclusively to the text generation component.
  • Uncensored Output: The primary goal of this modification is to prevent the model from generating phrases like "I'm sorry, but I can't assist with that." This results in an uncensored text output, particularly relevant for NSFW content captioning.
  • Vision-Language Capabilities: While the text generation is uncensored, the underlying vision processing capabilities remain consistent with the base model, as only the text part was processed.

Use Cases

  • Unrestricted NSFW Captioning: Ideal for applications requiring descriptive text for NSFW visual content without built-in refusal mechanisms.
  • Research into AI Safety Bypass: Can be used by researchers studying methods to remove or bypass AI safety filters in language models.