LiquidAI/LFM2-350M-Extract

TEXT GENERATIONConcurrency Cost:1Model Size:0.35BQuant:BF16Ctx Length:32kPublished:Sep 3, 2025License:lfm1.0Architecture:Transformer0.1K Cold

LiquidAI's LFM2-350M-Extract is a 350 million parameter language model, based on LFM2-350M, specifically engineered for extracting structured information from diverse unstructured documents. It excels at converting text into formats like JSON, XML, or YAML, supporting a 32768 token context length. This model is optimized for tasks such as invoice detail extraction, regulatory filing conversion, and populating knowledge graphs across English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, and Spanish.

Loading preview...

LFM2-350M-Extract: Structured Information Extraction

LFM2-350M-Extract, developed by LiquidAI, is a 350 million parameter model derived from LFM2-350M, specialized in transforming unstructured text into structured data formats such as JSON, XML, or YAML. It is designed to process a wide array of documents, including articles, transcripts, and reports, with a notable context length of 32768 tokens.

Key Capabilities and Features

  • Structured Data Extraction: Converts free-form text into machine-readable JSON, XML, or YAML based on specified schemas.
  • Multilingual Support: Capable of processing documents in English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, and Spanish.
  • Optimized for Accuracy: Recommends greedy decoding (temperature=0) and schema-guided system prompts for improved output accuracy.
  • Performance: Outperforms larger models like Gemma 3 4B in extraction tasks, as evaluated across 5,000 documents using metrics like Syntax score, Format accuracy, Keyword faithfulness, and LLM-based absolute/relative scoring.
  • Synthetic Data Training: Trained on a diverse synthetic dataset covering various document types, domains, styles, and lengths to ensure robust performance.
  • Single-Turn Conversations: Primarily intended for single-turn interactions for extraction tasks.

Ideal Use Cases

  • Business Process Automation: Automating the extraction of invoice details from emails or converting regulatory filings.
  • Data Integration: Transforming customer support tickets into structured formats for analytics pipelines.
  • Knowledge Graph Population: Extracting entities and attributes from reports to build or update knowledge graphs.
  • Document Processing: Efficiently converting diverse unstructured documents into consistent, structured outputs for downstream applications.