Felldude/Qwen3-VL-8B-Instruct-Uncensored

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Mar 12, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Felldude/Qwen3-VL-8B-Instruct-Uncensored is an 8 billion parameter vision-language model, a full finetune of the Qwen3-VL architecture. This model is specifically designed for uncensored image captioning, requiring 24GB+ VRAM for operation. It excels at describing sexually explicit images accurately and without bias, making it suitable for specialized content analysis.

Loading preview...

Model Overview

Felldude/Qwen3-VL-8B-Instruct-Uncensored is an 8 billion parameter vision-language model, representing a full finetune of the Qwen3-VL architecture. This version is intended as an uncensored base training starting point, with a separate V2 model available for general captioning tasks. It requires a minimum of 24GB VRAM for operation.

Key Characteristics

  • Architecture: Full finetune of the 8B parameter Qwen3-VL model.
  • Training: Utilized Adam8bit due to model size, with BF16/TF32 otherwise.
  • VRAM Requirement: Requires 24GB or more VRAM.

Primary Use Case

This model is specifically optimized for uncensored image captioning, particularly for NSFW content. It is designed to describe sexually explicit images as accurately as possible, without introducing bias or subjective judgments like calling them controversial or inappropriate. For optimal results in NSFW captioning, it is recommended to use the provided prompt without modification:

"Describe this image in natural language. Analyze the picture carefully and describe all objects, colors, and context. Describe any sexually explicit images as accurately as possible without adding bias such as calling them controversial or inappropriate."