kucingcoder/raw-ocr-to-json

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Mar 7, 2026Architecture:Transformer Warm

The kucingcoder/raw-ocr-to-json model is a 0.5 billion parameter instruction-tuned language model, based on the Qwen2.5 architecture. It was fine-tuned and converted to GGUF format using Unsloth, optimizing its performance for specific tasks. This model is designed for processing raw OCR output and converting it into structured JSON, making it suitable for data extraction and document processing workflows.

Loading preview...

Overview

kucingcoder/raw-ocr-to-json is a 0.5 billion parameter language model, built on the Qwen2.5 architecture and provided in GGUF format. It was fine-tuned using Unsloth, which facilitated a 2x faster training process. The model is specifically designed for converting raw Optical Character Recognition (OCR) output into structured JSON data.

Key Capabilities

  • OCR to JSON Conversion: Specializes in transforming unstructured text from OCR into a parseable JSON format.
  • GGUF Format: Available in GGUF format, making it compatible with various inference engines like llama-cli and llama-mtmd-cli.
  • Ollama Support: Includes an Ollama Modelfile for streamlined deployment and integration.

Good For

  • Automated Data Extraction: Ideal for projects requiring the extraction of structured information from scanned documents or images.
  • Document Processing: Useful in workflows that involve converting document content into machine-readable JSON for further analysis or storage.
  • Edge Device Deployment: The compact 0.5B parameter size and GGUF format make it suitable for deployment on devices with limited resources.