Disty0/Qwen3-VL-8B-NSFW-Caption-V4.5

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 21, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Disty0/Qwen3-VL-8B-NSFW-Caption-V4.5 is an 8 billion parameter vision-language model, reuploaded from thesby/Qwen3-VL-8B-NSFW-Caption-V4.5, with a context length of 32768 tokens. This model is specifically fine-tuned for generating captions for NSFW (Not Safe For Work) visual content. Its primary application is in automated NSFW image description and content moderation systems.

Loading preview...

Model Overview

Disty0/Qwen3-VL-8B-NSFW-Caption-V4.5 is an 8 billion parameter vision-language model, originally developed by thesby and reuploaded by Disty0. It is built upon the Qwen3-VL architecture, known for its multimodal capabilities, and features a substantial context length of 32768 tokens.

Key Capabilities

  • NSFW Caption Generation: The model is specifically fine-tuned to generate descriptive captions for Not Safe For Work (NSFW) visual content.
  • Vision-Language Integration: It processes both visual inputs and generates textual outputs, making it suitable for image understanding tasks.
  • High Context Length: With 32768 tokens, it can handle complex visual scenes or longer descriptive outputs.

Good For

  • Automated Content Moderation: Identifying and describing potentially inappropriate visual content for filtering or review systems.
  • Specialized Image Tagging: Generating detailed tags or descriptions for NSFW images in specific datasets.
  • Research in NSFW Content Analysis: Providing a tool for studying and understanding the characteristics of NSFW visual data through automated captioning.