Model Overview
The dcipheranalytics/phi-2-pii-bbi model is a 3 billion parameter language model built upon the Microsoft Phi-2 architecture. Its primary specialization is the identification and extraction of Personal Identifiable Information (PII) from text, with a particular focus on banking and insurance-related data.
Key Capabilities
- PII Extraction: Designed to accurately pinpoint and list PII within given text inputs.
- Domain-Specific Training: Fine-tuned on a proprietary dataset of GPT-4 generated customer service conversations covering 200 banking topics and 100 insurance topics, enhancing its relevance and performance in these sectors.
- Conversational Data Processing: Optimized for handling and analyzing conversational text, making it suitable for customer interaction logs and support tickets.
Performance Metrics
Evaluation results indicate strong performance in PII identification:
- Average Performance: Achieves an average precision of 0.836, recall of 0.781, and an F1-score of 0.802 across its training data.
- Topic-Specific Performance: Detailed per-topic evaluation shows varied but generally robust performance across different PII categories.
Use Cases
This model is particularly well-suited for applications requiring automated PII detection and anonymization, such as:
- Data Privacy Compliance: Helping organizations comply with data protection regulations by identifying and managing sensitive information.
- Customer Service Analytics: Anonymizing customer interactions before analysis to protect privacy.
- Risk Management: Identifying potential data leakage points in textual data.