Felldude/Qwen3-VL-8B-Instruct-Uncensored-V2

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 17, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Felldude/Qwen3-VL-8B-Instruct-Uncensored-V2 is an 8 billion parameter vision-language instruction-tuned model, a full fine-tune of the Qwen3-VL-8B architecture. This version is specifically optimized for NSFW content captioning and description, requiring 24GB or more VRAM for operation. It excels at generating explicit and uncensored descriptions of images and videos, focusing on concrete details and using vulgar slang.

Loading preview...

Felldude/Qwen3-VL-8B-Instruct-Uncensored-V2 Overview

This model is a full fine-tune of the 8 billion parameter Qwen3-VL-8B architecture, with its vision encoder frozen during training. It was developed using Adam8bit for optimization due to its size, requiring 24GB or more VRAM for execution. The primary differentiator of this version is its specialization in generating highly explicit and uncensored descriptions for images and videos, including NSFW content.

Key Capabilities

  • Explicit Content Description: Designed to provide detailed, uncensored captions for NSFW images and videos, utilizing vulgar slang and blunt phrasing.
  • Detailed Visual Analysis: Focuses on concrete visual elements such as color, shape, texture, spatial relationships, lighting, camera angles, and composition style.
  • Artifact and Watermark Detection: Capable of identifying and mentioning watermarks, signatures, or compression artifacts within images.
  • Subjective Aesthetic Quality Assessment: Can include information about the subjective aesthetic quality of an image, from low to very high.
  • Image Orientation and Aspect Ratio Identification: Specifies whether an image is portrait, landscape, or square, and its aspect ratio if obvious.

Good for

  • NSFW Content Captioning: Ideal for applications requiring explicit and detailed textual descriptions of adult visual content.
  • Uncensored Image Analysis: Suitable for use cases where polite euphemisms are to be avoided, and blunt, casual phrasing is preferred.
  • Specific Visual Detail Extraction: Useful for tasks that demand precise descriptions of visual attributes, including technical photographic details like aperture, shutter speed, and ISO (when applicable to photos).