datalab-to/lift
VISIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 19, 2026License:openrailArchitecture:Transformer0.1K Open Weights Cold
lift is a 9 billion parameter structured extraction model developed by Datalab, designed to pull structured JSON data from PDFs and images. It utilizes schema-constrained decoding to guarantee valid, well-typed output based on any provided JSON schema. This model excels at accurately extracting data from multi-page documents, including values spanning pages, making it ideal for document processing and data automation tasks.
Loading preview...
Datalab lift: Structured Data Extraction from Documents
lift is a 9 billion parameter model from Datalab, specifically engineered for structured data extraction from PDFs and images. Its core capability lies in generating valid, well-typed JSON output by adhering to any provided JSON schema, leveraging schema-constrained decoding.
Key Capabilities
- Schema-Constrained Extraction: Guarantees valid JSON output matching a user-defined schema, supporting
string,number,integer,boolean, arrays, and nested objects. - Multi-Page Document Handling: Processes entire multi-page documents in a single pass, including fields and values that span across pages.
- Flexible Inference: Supports two inference modes: local (HuggingFace Transformers) and remote (vLLM server), offering deployment flexibility.
- Developer Tools: Includes a command-line interface (CLI) for single files, inline schemas, or directories, and a Streamlit-based Schema Studio for schema development and testing.
- Performance: Achieves 90.2% field accuracy on a 225-document benchmark, with a median latency of 9.5 seconds per document when served with vLLM.
Good For
- Automating data extraction from invoices, forms, and other structured or semi-structured documents.
- Ensuring data integrity with schema-guaranteed output.
- Developers needing a robust, deployable solution for converting unstructured document content into structured data formats.