Model Overview

Thesby/Qwen3-VL-8B-NSFW-Caption-V4.5 is an 8 billion parameter multimodal large language model, fine-tuned from Qwen/Qwen3-VL-8B-Instruct using Lora, specifically designed for advanced image captioning. It was trained on a diverse dataset of approximately 2 million high-quality image-text pairs, including both public and private datasets covering a wide range of SFW (Safe for Work) and NSFW (Not Safe for Work) scenarios.

Key Capabilities

Ultra-High Quality Captions: Generates highly detailed descriptions, capturing core subjects, backgrounds, emotions, materials, and lighting, with performance comparable to Gemini-2.5-Flash.
SFW & NSFW Content Support: Capable of effectively identifying and describing both SFW and NSFW image content, broadening its applicability.
Long-form Detailed Description: Excels at producing extensive, multi-hundred-word descriptions for complex image scenes, analyzing narrative structures and underlying meanings.
Short Video Description: Supports short video analysis by processing frames extracted at 1 FPS.
Improved English Prompt Handling: Addresses previous issues with English prompt refusal for image descriptions.

Intended Uses

Automated Content Annotation: Generating high-quality descriptions and tags for large image datasets in content management, retrieval, and recommendation systems.
Accessibility Features: Describing image content for visually impaired users.
Creative Content Generation: Providing image-based text descriptions as inspiration for art, storytelling, and advertising copy.
Digital Content Analysis: Automating analysis and archiving of various image content, including SFW and NSFW.

Limitations

Like other large models, it may exhibit hallucinations (generating non-existent details) and data biases reflecting its training data. Users should exercise caution, especially with NSFW content, and ensure compliance with local regulations. The model's output should not be considered absolute fact or used for critical decision-making without human review.

Overview

Model Overview

Key Capabilities

Intended Uses

Limitations

Full Model Card (README)