PanduAldi/raw-ocr-to-json-model
PanduAldi/raw-ocr-to-json-model is a 0.5 billion parameter model, finetuned by PanduAldi, specifically designed for converting raw OCR output into structured JSON format. This model leverages the Qwen2.5-0.5B-Instruct architecture and was optimized using Unsloth for faster training. Its primary strength lies in streamlining the post-OCR data processing workflow by providing structured data output.
Loading preview...
Overview
The PanduAldi/raw-ocr-to-json-model is a 0.5 billion parameter model, based on the Qwen2.5-0.5B-Instruct architecture, that has been finetuned to transform raw Optical Character Recognition (OCR) output into a structured JSON format. This model was developed by PanduAldi and converted to GGUF format, with its training significantly accelerated using Unsloth.
Key Capabilities
- OCR to JSON Conversion: Specializes in taking unstructured text from OCR and converting it into a parseable JSON structure.
- GGUF Format: Available in GGUF format, specifically
Qwen2.5-0.5B-Instruct.Q4_K_M.gguf, making it compatible with various inference engines. - Ollama Integration: Includes an Ollama Modelfile for straightforward deployment and local execution.
- Optimized Training: Benefits from Unsloth's optimization for faster finetuning.
Good For
- Developers needing to process OCR results into structured data programmatically.
- Applications requiring automated extraction of key information from scanned documents or images.
- Local deployment scenarios using
llama-cliorllama-mtmd-cliwith Jinja templating.