Felldude/Qwen3-VL-4B-Instruct-Uncensored-FP8

VISIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 28, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Felldude/Qwen3-VL-4B-Instruct-Uncensored-FP8 is a 4 billion parameter vision-language model, finetuned with image and text pairs at 1024px resolution. This model demonstrates high affinity with limited hallucination specifically on NSFW tasks. It also possesses limited video captioning capabilities, making it suitable for specialized visual content analysis.

Loading preview...

Felldude/Qwen3-VL-4B-Instruct-Uncensored-FP8 Overview

This model is a 4 billion parameter vision-language model developed by Felldude, designed for multimodal understanding. It has been specifically finetuned using image and text pairs at a resolution of 1024 pixels.

Key Capabilities

  • Vision-Language Understanding: Processes both image and text inputs.
  • Reduced Hallucination: Exhibits limited hallucination, particularly when handling NSFW content.
  • NSFW Task Affinity: Shows high performance and understanding for tasks involving NSFW imagery.
  • Limited Video Captioning: Possesses some ability to generate captions for video content.
  • FP8 Optimization: Utilizes blockwise conversion to FP8 during training, with key blocks retaining FP32 precision for performance.

Good For

  • Specialized NSFW Content Analysis: Its finetuning makes it particularly effective for tasks requiring understanding and processing of NSFW images with reduced hallucination.
  • Multimodal Applications: Suitable for applications that require joint processing of visual and textual information.
  • Video Captioning (Limited): Can be used for basic video captioning tasks where its limited capabilities are sufficient.

Note: For optimal performance, it is recommended to use the full FP32 training when quantizing, as detailed in the original model repository.