sukhrobnurali/qwen3vl-resume-parser
The sukhrobnurali/qwen3vl-resume-parser is an 8 billion parameter QLoRA fine-tune of Qwen/Qwen3-VL-8B-Instruct, developed by Sukhrob Nurali. This vision-language model is specifically optimized to parse resume/CV page images and extract information into a fixed 23-field JSON record. It offers a reduced VRAM footprint (~23 GB BF16) compared to larger models, making it suitable for structured data extraction from resumes in recruiting pipelines.
Loading preview...
Model Overview
sukhrobnurali/qwen3vl-resume-parser is an 8 billion parameter QLoRA fine-tune of the Qwen/Qwen3-VL-8B-Instruct vision-language model, developed by Sukhrob Nurali. It was created as an internal project at Corporate Solutions Group to provide a more efficient resume parsing solution. The model is published as merged full weights (BF16 safetensors), loading like a standard Qwen3-VL checkpoint without requiring adapter attachment.
Key Capabilities
- Resume-to-JSON Extraction: Specialized in converting resume/CV page images into a structured 23-field JSON record, including identity, contact, skills, experiences, and education.
- Optimized Schema: The 23-field schema and specific formatting rules are baked into the model's weights, simplifying prompts for structured output.
- Reduced VRAM Footprint: Operates with approximately 23 GB VRAM in BF16 at 16K context, significantly less than the 50 GB required by the 32B parameter model it replaces.
- Performance: Achieves an 83.9% weighted score and 88.2% unweighted score on a 51-sample held-out evaluation set, with 88.2% JSON validity.
When to Use This Model
- Structured Resume Data Extraction: Ideal for extracting specific, predefined data points from resume images for recruiting or ATS pipelines.
- Cost-Effective Parsing: Suitable when aiming to reduce GPU costs for resume parsing while maintaining parsing quality.
- Batch Processing: Designed for batch or offline processing due to an average inference time of ~92.0 seconds per resume on an A100.
Limitations
- Domain and Language Skew: Primarily trained on English, IT/software-centric resumes; performance may degrade on non-technical, unusual layouts, or non-English documents.
- Schema Lock-in: The model is tightly coupled to its specific 23-field schema and enum vocabularies, which may not align with different downstream requirements.
- JSON Validity: Approximately 12% of outputs may be invalid JSON, requiring defensive parsing in downstream applications.