joedonino/beni_qwen3vl_4b_product_052226v2_r64_b8

VISIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:May 26, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The joedonino/beni_qwen3vl_4b_product_052226v2_r64_b8 is a 4 billion parameter Qwen3-VL model, developed by joedonino, fine-tuned from unsloth/Qwen3-VL-4B-Instruct-unsloth-bnb-4bit. This model was trained using Unsloth and Huggingface's TRL library, achieving a 2x speedup in the fine-tuning process. It is designed for visual language tasks, leveraging its Qwen3-VL architecture.

Loading preview...

Model Overview

The joedonino/beni_qwen3vl_4b_product_052226v2_r64_b8 is a 4 billion parameter visual language model, fine-tuned by joedonino. It is based on the unsloth/Qwen3-VL-4B-Instruct-unsloth-bnb-4bit architecture.

Key Characteristics

  • Architecture: Qwen3-VL, indicating capabilities in processing both visual and linguistic information.
  • Fine-tuning: The model was fine-tuned using Unsloth and Huggingface's TRL library, which enabled a 2x faster training process.
  • Developer: Developed by joedonino.
  • License: Released under the Apache-2.0 license.

Potential Use Cases

Given its Qwen3-VL base, this model is suitable for applications requiring:

  • Visual Question Answering (VQA): Answering questions based on image content.
  • Image Captioning: Generating descriptive text for images.
  • Multimodal Understanding: Tasks that involve interpreting and generating content from both visual and text inputs.

This model offers a fine-tuned visual language solution, benefiting from accelerated training techniques.