stockmark/Stockmark-DocReasoner-Qwen2.5-VL-32B
Stockmark-DocReasoner-Qwen2.5-VL-32B is a 32 billion parameter vision-language model developed by Stockmark Inc., built upon Qwen2.5-VL-32B-Instruct. It is specialized for Japanese document understanding and reasoning, particularly within the manufacturing domain, and is designed to extract implicit knowledge from visually rich and complex documents like technical drawings and business reports. The model features explicit Chain-of-Thought reasoning capabilities and multi-modal understanding across documents, charts, tables, and diagrams.
Loading preview...
What is Stockmark-DocReasoner-Qwen2.5-VL-32B?
Stockmark-DocReasoner-Qwen2.5-VL-32B is a 32 billion parameter vision-language model (VLM) developed by Stockmark Inc. It is an instruction-tuned model based on Qwen2.5-VL-32B-Instruct, specifically enhanced for Japanese document understanding and reasoning, with a focus on the manufacturing domain.
Key Capabilities & Features
- Domain Specialization: Optimized for manufacturing and business documents, including technical documentation, engineering drawings, experimental reports, and business documents.
- Multi-modal Understanding: Capable of interpreting information across various visual elements such as documents, charts, tables, and diagrams.
- Chain-of-Thought (CoT) Reasoning: Incorporates explicit "thinking" capabilities to provide structured reasoning processes alongside final answers.
- Structured Output Modes: Supports special inference modes for converting documents into structured HTML, Markdown, JSON, or extracting chemical structures into SMILES format.
Performance Highlights
The model's performance was evaluated on Japanese document understanding benchmarks, including JA-Business-Doc-RQ-Bench, JDocQA, and BusinessSlideVQA. On the JA-Business-Doc-RQ-Bench, it achieved an overall score of 85.15, demonstrating strong performance in document-specific question answering (96.61% for document image types). In JDocQA, it scored 0.31 accuracy, outperforming Qwen3-VL-32B-Thinking and Qwen2.5-VL-32B-Instruct. For BusinessSlideVQA, it achieved 77.27% accuracy.
When to Use This Model
This model is ideal for applications requiring advanced understanding and reasoning over complex, visually rich Japanese business and technical documents, especially within the manufacturing sector. Its CoT capabilities and structured output options make it suitable for tasks needing explainable AI and precise data extraction.