monkeyslikebananas/Qwen3-VL-8B-NSFW-Caption-V4.5
Thesby/Qwen3-VL-8B-NSFW-Caption-V4.5 is an 8 billion parameter multimodal large language model, fine-tuned from Qwen/Qwen3-VL-8B-Instruct, specializing in high-quality image captioning. It excels at generating ultra-detailed, long-form descriptions for both SFW and NSFW content, capturing intricate visual details, and supports short video description by frame extraction. This model is optimized for comprehensive image understanding and text generation, making it suitable for automated content annotation and accessibility features.
Loading preview...
Model Overview
Thesby/Qwen3-VL-8B-NSFW-Caption-V4.5 is an 8 billion parameter multimodal large language model, fine-tuned from Qwen/Qwen3-VL-8B-Instruct using Lora, specifically designed for advanced image captioning. It was trained on a diverse dataset of approximately 2 million high-quality image-text pairs, including both public and private datasets covering a wide range of SFW (Safe for Work) and NSFW (Not Safe for Work) scenarios.
Key Capabilities
- Ultra-High Quality Captions: Generates highly detailed descriptions, capturing core subjects, backgrounds, emotions, materials, and lighting, with performance comparable to Gemini-2.5-Flash.
- SFW & NSFW Content Support: Capable of effectively identifying and describing both SFW and NSFW image content, broadening its applicability.
- Long-form Detailed Description: Excels at producing extensive, multi-hundred-word descriptions for complex image scenes, analyzing narrative structures and underlying meanings.
- Short Video Description: Supports short video analysis by processing frames extracted at 1 FPS.
- Improved English Prompt Handling: Addresses previous issues with English prompt refusal for image descriptions.
Intended Uses
- Automated Content Annotation: Generating high-quality descriptions and tags for large image datasets in content management, retrieval, and recommendation systems.
- Accessibility Features: Describing image content for visually impaired users.
- Creative Content Generation: Providing image-based text descriptions as inspiration for art, storytelling, and advertising copy.
- Digital Content Analysis: Automating analysis and archiving of various image content, including SFW and NSFW.
Limitations
Like other large models, it may exhibit hallucinations (generating non-existent details) and data biases reflecting its training data. Users should exercise caution, especially with NSFW content, and ensure compliance with local regulations. The model's output should not be considered absolute fact or used for critical decision-making without human review.