datalab-to/lift

VISIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 19, 2026License:openrailArchitecture:Transformer0.1K Open Weights Cold

lift is a 9 billion parameter structured extraction model developed by Datalab, designed to pull structured JSON data from PDFs and images. It utilizes schema-constrained decoding to guarantee valid, well-typed output based on any provided JSON schema. This model excels at accurately extracting data from multi-page documents, including values spanning pages, making it ideal for document processing and data automation tasks.

Loading preview...

Datalab lift: Structured Data Extraction from Documents

lift is a 9 billion parameter model from Datalab, specifically engineered for structured data extraction from PDFs and images. Its core capability lies in generating valid, well-typed JSON output by adhering to any provided JSON schema, leveraging schema-constrained decoding.

Key Capabilities

  • Schema-Constrained Extraction: Guarantees valid JSON output matching a user-defined schema, supporting string, number, integer, boolean, arrays, and nested objects.
  • Multi-Page Document Handling: Processes entire multi-page documents in a single pass, including fields and values that span across pages.
  • Flexible Inference: Supports two inference modes: local (HuggingFace Transformers) and remote (vLLM server), offering deployment flexibility.
  • Developer Tools: Includes a command-line interface (CLI) for single files, inline schemas, or directories, and a Streamlit-based Schema Studio for schema development and testing.
  • Performance: Achieves 90.2% field accuracy on a 225-document benchmark, with a median latency of 9.5 seconds per document when served with vLLM.

Good For

  • Automating data extraction from invoices, forms, and other structured or semi-structured documents.
  • Ensuring data integrity with schema-guaranteed output.
  • Developers needing a robust, deployable solution for converting unstructured document content into structured data formats.