Felldude/Qwen3-VL-4B-Instruct-Uncensored
Felldude/Qwen3-VL-4B-Instruct-Uncensored is a 4 billion parameter vision-language model, based on the Qwen3-VL architecture, fine-tuned by Felldude. This model is specifically optimized for processing image and text pairs at 1024px resolution, demonstrating high affinity and reduced hallucination on NSFW tasks. It also possesses limited video captioning capabilities, making it suitable for applications requiring robust visual understanding and content generation.
Loading preview...
Overview
Felldude/Qwen3-VL-4B-Instruct-Uncensored is a 4 billion parameter vision-language model, fine-tuned by Felldude. It is built upon the Qwen3-VL architecture and has undergone full FP32 training using AdamW, without 8-bit optimizers. This model is distinguished by its specific optimization for handling image and text pairs at a resolution of 1024px.
Key Capabilities
- High Affinity with Reduced Hallucination: The model shows strong performance and limited hallucination, particularly when dealing with NSFW content.
- Image-Text Processing: Optimized for understanding and generating content from combined image and text inputs at 1024px resolution.
- Limited Video Captioning: Possesses some ability to generate captions for video content.
Good For
- Applications requiring robust visual understanding, especially with high-resolution images.
- Use cases involving NSFW content where reduced hallucination is critical.
- Exploratory tasks in video captioning, given its limited but present capability.