MayankLad31/invoice_schema

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 3, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

MayankLad31/invoice_schema is a 1.5 billion parameter model, fine-tuned with Reinforcement Learning (GRPO on Qwen2.5-Coder), designed for extracting structured JSON data from OCR text based on user-defined schemas. This model specializes in invoice processing, allowing users to define a schema and extract relevant information from scanned documents. Its primary use case is local, schema-driven data extraction from OCR output, particularly for invoices.

Loading preview...

Overview

MayankLad31/invoice_schema is a specialized 1.5 billion parameter model, fine-tuned using Reinforcement Learning (GRPO on Qwen2.5-Coder), engineered for extracting structured JSON data from OCR (Optical Character Recognition) text. Its core capability lies in processing document text, such as invoices, and converting it into a user-defined JSON format.

Key Capabilities

  • Schema-driven JSON Extraction: Users can provide any JSON schema, and the model will attempt to extract corresponding data from the input text.
  • OCR Integration: Designed to work in conjunction with OCR tools (like PaddleOCR) to process scanned documents or images.
  • Local Deployment: The model is available in a GGUF format, enabling 100% local execution for privacy-sensitive or offline applications.
  • Invoice Processing: Particularly effective for extracting details like invoice numbers, recipient information, payment details, itemized lists, and totals from invoice documents.

Good for

  • Automating data entry from scanned invoices or receipts.
  • Developing custom document processing pipelines where data structure is critical.
  • Applications requiring local, offline data extraction capabilities.
  • Developers looking for a specialized model to convert unstructured OCR text into structured JSON based on a flexible schema.