manuelaschrittwieser/llama-3-invoice-extractor-merged is an 8B parameter Llama-3 based model, fine-tuned by Manuela Schrittwieser, specifically for structured information extraction from financial text. It excels at transforming unstructured invoice descriptions, receipt notes, and purchase logs into strictly valid JSON objects, making it ideal for automated accounting and financial data pipelines. The model achieves 100% valid JSON output in test runs, a significant improvement over the baseline Llama-3-8B.
Loading preview...
Llama-3-8B Invoice Extractor (Merged)
This model, developed by Manuela Schrittwieser, is a fine-tuned version of Meta's Llama-3-8B, specifically optimized for Structured Information Extraction from financial documents. It acts as a "Parser Agent" to convert unstructured text from invoices, receipts, and purchase logs into machine-readable, valid JSON objects.
Key Capabilities
- Strict JSON Output: Guarantees 100% valid JSON output in test runs, unlike the conversational output of the baseline Llama-3-8B.
- High Precision Entity Recognition: Accurately maps extracted data to a predefined JSON schema (item, quantity, date, vendor, total, currency).
- Instruction Following: Demonstrates high adherence to extraction instructions, staying within the response block.
- Efficient Training: Trained using QLoRA via the Unsloth library, enabling faster training and reduced VRAM usage.
Good For
- Automated accounting and bookkeeping workflows.
- Extracting structured data from OCR-processed receipts.
- Building serverless financial data pipelines requiring precise JSON output.
Limitations
- Primarily optimized for English; performance in other languages is not guaranteed.
- Can occasionally misinterpret ambiguous date formats.
- Best performance with input lengths under 512 tokens.