LiquidAI/LFM2-350M-PII-Extract-JP

TEXT GENERATIONConcurrency Cost:1Model Size:0.35BQuant:BF16Ctx Length:32kPublished:Sep 30, 2025License:lfm1.0Architecture:Transformer0.1K Cold

LiquidAI's LFM2-350M-PII-Extract-JP is a 0.35 billion parameter language model, based on LFM2-350M, specifically fine-tuned for extracting personally identifiable information (PII) from Japanese text. It identifies addresses, company names, email addresses, human names, and phone numbers, outputting them in JSON format. This model achieves GPT-5 level performance for PII extraction, enabling efficient on-device masking of sensitive data in documents like contracts and medical reports.

Loading preview...

Overview

LiquidAI's LFM2-350M-PII-Extract-JP is a specialized 0.35 billion parameter model derived from LFM2-350M, engineered for robust PII extraction from Japanese text. It processes input and outputs identified entities in a structured JSON format, making it ideal for privacy-focused applications requiring on-device data masking.

Key Capabilities

  • Targeted PII Extraction: Specifically trained to identify and extract Japanese addresses/locations, company/institute/organization names, email addresses, human names, and phone numbers.
  • JSON Output: Presents extracted information as a JSON object, with empty lists for categories where no entities are found, and lists of strings for identified entities.
  • High Performance: Achieves performance comparable to GPT-5 on PII extraction tasks, despite its significantly smaller size, as demonstrated on 1,000 samples from the finepdf dataset.
  • On-Device Suitability: Its compact 350M parameter count allows for efficient execution directly on devices, bringing cloud-grade PII extraction capabilities to edge applications.
  • Exact Entity Output: The model is designed to output entities exactly as they appear in the source text, including variations, to facilitate precise masking.

Usage Recommendations

  • Greedy Decoding: Strongly recommended to use temperature=0 for generation.
  • System Prompt: Requires a specific system prompt, e.g., Extract <address>, <company_name>, <email_address>, <human_name>, <phone_number>, with entity categories in alphabetical order for optimal performance.
  • Single-Turn Conversations: Optimized for single-turn interactions.

Limitations and Future Development

While highly effective for its defined categories, the model is a foundational tool. Future enhancements, potentially through community fine-tuning, could include support for organization-specific identification numbers, additional PII categories like date of birth or passport numbers, and further performance improvements on specific entity types.