LiquidAI/LFM2-1.2B-Extract
TEXT GENERATIONConcurrency Cost:1Model Size:1.2BQuant:BF16Ctx Length:32kPublished:Aug 22, 2025License:lfm1.0Architecture:Transformer0.1K Cold
LFM2-1.2B-Extract by LiquidAI is a 1.2 billion parameter language model specifically designed for extracting structured information from diverse unstructured documents. Based on LFM2-1.2B, it excels at converting text into formats like JSON, XML, or YAML. This model supports 9 languages and is optimized for accurate and faithful data extraction, outperforming larger models in complex object output.
Loading preview...
LFM2-1.2B-Extract: Structured Information Extraction
LFM2-1.2B-Extract, developed by LiquidAI, is a 1.2 billion parameter model built upon LFM2-1.2B, specializing in the extraction of critical information from unstructured documents. It transforms varied text sources, such as articles, transcripts, and reports, into structured formats like JSON, XML, or YAML.
Key Capabilities
- Structured Output Generation: Converts unstructured text into valid JSON, XML, or YAML based on provided schemas.
- Multilingual Support: Supports English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, and Spanish.
- Robust Performance: Evaluated on 5,000 documents across 100+ topics, demonstrating high syntax scores, format accuracy, and keyword faithfulness. It can output complex objects in multiple languages at a level comparable to models significantly larger, such as Gemma 3 27B.
- Optimized for Accuracy: Recommends greedy decoding (
temperature=0) and providing a specific system prompt with schema for improved accuracy. - Synthetic Data Training: Trained on a diverse synthetic dataset covering various document types, domains, styles, lengths, and languages to ensure robust performance.
Good For
- Extracting invoice details from emails into structured JSON.
- Converting regulatory filings into XML for compliance systems.
- Transforming customer support tickets into YAML for analytics pipelines.
- Populating knowledge graphs with entities and attributes from unstructured reports.
- Single-turn conversational extraction tasks.