PanduAldi/raw-ocr-to-json-model

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Mar 10, 2026Architecture:Transformer Warm

PanduAldi/raw-ocr-to-json-model is a 0.5 billion parameter model, finetuned by PanduAldi, specifically designed for converting raw OCR output into structured JSON format. This model leverages the Qwen2.5-0.5B-Instruct architecture and was optimized using Unsloth for faster training. Its primary strength lies in streamlining the post-OCR data processing workflow by providing structured data output.

Loading preview...

Overview

The PanduAldi/raw-ocr-to-json-model is a 0.5 billion parameter model, based on the Qwen2.5-0.5B-Instruct architecture, that has been finetuned to transform raw Optical Character Recognition (OCR) output into a structured JSON format. This model was developed by PanduAldi and converted to GGUF format, with its training significantly accelerated using Unsloth.

Key Capabilities

  • OCR to JSON Conversion: Specializes in taking unstructured text from OCR and converting it into a parseable JSON structure.
  • GGUF Format: Available in GGUF format, specifically Qwen2.5-0.5B-Instruct.Q4_K_M.gguf, making it compatible with various inference engines.
  • Ollama Integration: Includes an Ollama Modelfile for straightforward deployment and local execution.
  • Optimized Training: Benefits from Unsloth's optimization for faster finetuning.

Good For

  • Developers needing to process OCR results into structured data programmatically.
  • Applications requiring automated extraction of key information from scanned documents or images.
  • Local deployment scenarios using llama-cli or llama-mtmd-cli with Jinja templating.