dcipheranalytics/phi-2-pii-bbi

TEXT GENERATIONConcurrency Cost:1Model Size:3BQuant:BF16Ctx Length:2kPublished:Feb 6, 2024Architecture:Transformer0.0K Cold

The dcipheranalytics/phi-2-pii-bbi model is a 3 billion parameter causal language model based on the Microsoft Phi-2 architecture. It is specifically fine-tuned for identifying Personal Identifiable Information (PII) within text, particularly in banking and insurance contexts. This model excels at extracting sensitive data from customer service conversations, making it suitable for data anonymization and privacy compliance tasks. It offers a 2048 token context length, optimized for processing conversational data.

Loading preview...

Model Overview

The dcipheranalytics/phi-2-pii-bbi model is a 3 billion parameter language model built upon the Microsoft Phi-2 architecture. Its primary specialization is the identification and extraction of Personal Identifiable Information (PII) from text, with a particular focus on banking and insurance-related data.

Key Capabilities

  • PII Extraction: Designed to accurately pinpoint and list PII within given text inputs.
  • Domain-Specific Training: Fine-tuned on a proprietary dataset of GPT-4 generated customer service conversations covering 200 banking topics and 100 insurance topics, enhancing its relevance and performance in these sectors.
  • Conversational Data Processing: Optimized for handling and analyzing conversational text, making it suitable for customer interaction logs and support tickets.

Performance Metrics

Evaluation results indicate strong performance in PII identification:

  • Average Performance: Achieves an average precision of 0.836, recall of 0.781, and an F1-score of 0.802 across its training data.
  • Topic-Specific Performance: Detailed per-topic evaluation shows varied but generally robust performance across different PII categories.

Use Cases

This model is particularly well-suited for applications requiring automated PII detection and anonymization, such as:

  • Data Privacy Compliance: Helping organizations comply with data protection regulations by identifying and managing sensitive information.
  • Customer Service Analytics: Anonymizing customer interactions before analysis to protect privacy.
  • Risk Management: Identifying potential data leakage points in textual data.