typhoon-ai/typhoon-ocr1.5-2b
Typhoon-AI's Typhoon-OCR1.5-2B is a 2 billion parameter vision-language model built on Qwen3-VL 2B, specifically designed for robust optical character recognition (OCR) and document parsing of real-world Thai and English documents. It excels at extracting structured information from diverse document types, including handwritten content, complex forms, and both text-rich and image-rich pages, with a 32K context length. This model offers faster inference and simplified integration by directly processing images without relying on PDF metadata and utilizing a single-prompt architecture.
Loading preview...
What is Typhoon-OCR1.5-2B?
Typhoon-OCR1.5-2B is an open-source, 2 billion parameter vision-language model developed by Typhoon-AI, optimized for robust Optical Character Recognition (OCR) and document parsing. Built upon the Qwen3-VL 2B architecture, this model is specifically engineered to handle real-world Thai and English documents, including challenging formats like handwritten content and complex forms.
Key Enhancements & Capabilities
- Compact and Efficient: Based on Qwen3-VL 2B, it's smaller and more efficient, running effectively on lightweight hardware, especially with quantization.
- Faster Inference: Achieves high layout fidelity directly from images, eliminating reliance on PDF metadata, leading to significantly faster processing.
- Simplified Prompting: Features a single-prompt architecture, streamlining integration and ensuring consistent outputs across various document types.
- Enhanced Document Understanding: Significantly improved at parsing handwritten content, complex forms, and irregular layouts with greater consistency and semantic accuracy.
- Balanced Performance: Adapts intelligently to both text-rich reports and visually complex materials like infographics, ensuring high-quality output.
- Structured Output: Produces machine-friendly outputs in Markdown for general text, HTML for tables, LaTeX for equations, and
<figure>tags for diagrams, optimized for downstream AI and RAG systems.
When to Use This Model
- Document Intelligence: Ideal for extracting structured data from diverse documents, including financial tables, academic papers, government forms, and receipts.
- Multilingual OCR: Specifically designed for both Thai and English documents.
- Resource-Constrained Environments: Its compact size and efficiency make it suitable for deployment on lightweight hardware or for faster local inference.
- Integration into LLM Pipelines: Its standardized, structured output format (Markdown, HTML, LaTeX) makes it easy to integrate into RAG systems and other AI workflows.
Important Considerations
- This is a task-specific model intended for use only with its provided prompt structure. It does not include general VQA capabilities or guardrails.
- Users should be aware of potential hallucinations inherent in LLMs and assess risks for their specific use case.