sophy/finetuned-qwen-referrals
VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 5, 2025License:apache-2.0Architecture:Transformer Open Weights Cold
The sophy/finetuned-qwen-referrals model is an 8 billion parameter vision-language model, fine-tuned from Qwen3-VL by sophy. It is specifically optimized for extracting structured JSON data from referral form images. This model leverages Unsloth for faster training and is designed for efficient visual document processing tasks.
Loading preview...
Model Overview
The sophy/finetuned-qwen-referrals model is an 8 billion parameter vision-language model (VLM) developed by sophy. It is fine-tuned from the unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit base model, utilizing Unsloth and Hugging Face's TRL library for accelerated training.
Key Capabilities
- Vision-Language Processing: Integrates visual and textual understanding.
- Structured Data Extraction: Specifically fine-tuned to extract information from referral form images and output it in a structured JSON format.
- Optimized Performance: Benefits from Unsloth for faster training and efficient inference, including 4-bit loading for reduced memory usage.
- Flexible Usage: Can be used with Unsloth's
FastVisionModelfor streamlined inference or directly with Hugging Face TransformersAutoProcessorandQwen3VLForConditionalGeneration.
Ideal Use Cases
This model is particularly well-suited for applications requiring:
- Automated processing of referral forms.
- Converting image-based documents into structured, machine-readable data.
- Healthcare or administrative systems needing to digitize and parse visual records.