aashish093/qwen3-vl-4b-scheme-extract

VISIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 11, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The aashish093/qwen3-vl-4b-scheme-extract model is a 4 billion parameter Qwen3-VL architecture, fine-tuned by aashish093. This model was optimized for training speed using Unsloth and Huggingface's TRL library, building upon the unsloth/Qwen3-VL-4B-Instruct-unsloth-bnb-4bit base. It is designed for visual language tasks, leveraging its Qwen3-VL foundation for multimodal understanding and generation.

Loading preview...

Model Overview

The aashish093/qwen3-vl-4b-scheme-extract is a 4 billion parameter visual language model, fine-tuned by aashish093. It is based on the Qwen3-VL architecture, specifically building upon the unsloth/Qwen3-VL-4B-Instruct-unsloth-bnb-4bit model.

Key Characteristics

  • Architecture: Qwen3-VL, indicating its capability for multimodal tasks involving both visual and linguistic inputs.
  • Parameter Count: 4 billion parameters, offering a balance between performance and computational efficiency.
  • Training Optimization: The fine-tuning process for this model was accelerated using Unsloth and Huggingface's TRL library, resulting in a 2x faster training time compared to standard methods.
  • License: Distributed under the Apache-2.0 license.

Potential Use Cases

This model is suitable for applications requiring:

  • Visual Language Understanding: Tasks that involve interpreting and generating responses based on both images and text.
  • Efficient Deployment: Its 4B parameter size and optimized training suggest it could be a good candidate for scenarios where faster inference or reduced resource consumption is beneficial.
  • Further Fine-tuning: Developers looking for a Qwen3-VL base that has undergone optimized fine-tuning might find this model a good starting point for specific downstream tasks.