akiFQC/LFM2.5-1.2B-JP-202606-Conf-Extract
The akiFQC/LFM2.5-1.2B-JP-202606-Conf-Extract model, developed by akiFQC, is a 1.2 billion parameter language model based on LiquidAI/LFM2.5-1.2B-JP-202606, specifically fine-tuned for extracting confidential Japanese proper nouns. It identifies 11 categories of sensitive information from text, such as addresses, names, and financial data, outputting them as structured JSON. With a context length of 32768 tokens, this model is optimized for PII extraction from internal documents, logs, and emails.
Loading preview...
Overview
akiFQC/LFM2.5-1.2B-JP-202606-Conf-Extract is a 1.2 billion parameter model, built upon the LiquidAI/LFM2.5-1.2B-JP-202606 base, specifically designed for the extraction of confidential proper nouns from Japanese text. It processes input text and outputs a single-line JSON object containing identified entities across 11 distinct categories. This model is available in SafeTensors (transformers) format, with a GGUF version also provided for llama.cpp and on-device deployment.
Key Capabilities
- Confidential Information Extraction: Identifies and extracts 11 categories of sensitive data, including
address,company_name,email_address,human_name,phone_number,account_identifier,network_identifier,system_config,project_info,financial_info, andtransaction_id. - Structured Output: Presents extracted information as a single-line JSON object, ensuring all 11 keys are present, with empty lists
[]for non-existent categories. - Japanese Language Focus: Optimized for processing Japanese text, making it suitable for internal documents, logs, and emails within Japanese-speaking contexts.
- High Context Length: Supports a maximum context length of 2048 tokens during training, enabling processing of moderately sized texts.
Use Cases
- Data Loss Prevention (DLP): Automatically identify and flag sensitive information in corporate communications and documents.
- Compliance and Auditing: Assist in ensuring adherence to data privacy regulations by highlighting PII and confidential data.
- Security Monitoring: Extract network identifiers, system configurations, and account identifiers from logs for security analysis.
Important Considerations
- The model's output is text-only, and extraction accuracy depends on data quality and context length.
- It is intended as an auxiliary tool, not a replacement for rule-based filtering, and may produce false positives or negatives. Post-extraction verification steps are recommended for high-precision use cases.