sophy/finetuned-qwen-referrals

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 5, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The sophy/finetuned-qwen-referrals model is an 8 billion parameter vision-language model, fine-tuned from Qwen3-VL by sophy. It is specifically optimized for extracting structured JSON data from referral form images. This model leverages Unsloth for faster training and is designed for efficient visual document processing tasks.

Loading preview...

Model Overview

The sophy/finetuned-qwen-referrals model is an 8 billion parameter vision-language model (VLM) developed by sophy. It is fine-tuned from the unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit base model, utilizing Unsloth and Hugging Face's TRL library for accelerated training.

Key Capabilities

  • Vision-Language Processing: Integrates visual and textual understanding.
  • Structured Data Extraction: Specifically fine-tuned to extract information from referral form images and output it in a structured JSON format.
  • Optimized Performance: Benefits from Unsloth for faster training and efficient inference, including 4-bit loading for reduced memory usage.
  • Flexible Usage: Can be used with Unsloth's FastVisionModel for streamlined inference or directly with Hugging Face Transformers AutoProcessor and Qwen3VLForConditionalGeneration.

Ideal Use Cases

This model is particularly well-suited for applications requiring:

  • Automated processing of referral forms.
  • Converting image-based documents into structured, machine-readable data.
  • Healthcare or administrative systems needing to digitize and parse visual records.