Overview
This model, naazimsnh02/qwen3-0.6b-pii-detector, is a specialized 0.8 billion parameter variant of the Qwen3-0.6B base model. It has been fine-tuned using LoRA with Unsloth on the nvidia/Nemotron-PII dataset, comprising 47,500 training samples, to perform Named Entity Recognition (NER) for PII and PHI.
Key Capabilities
- PII/PHI Detection: Identifies over 55 types of sensitive information, including personal identifiers, contact details, medical information, financial data, and digital identifiers.
- Inline Tagging: Outputs detected entities using an
[entity]label format, facilitating easy extraction and processing. - Context-Aware: Enhanced accuracy by optionally accepting domain (e.g., healthcare, finance), document type, and locale (US/international) information during inference.
- Natural Language Processing: Designed to work effectively across conversations, documents, forms, and unstructured text.
Performance & Training
The model completed 2.096 epochs of training, achieving a final training loss of 0.4155 and a best evaluation loss of 0.4551. It was trained with a max sequence length of 2048 on an Nvidia L4 GPU.
Limitations
- Context Dependency: Accuracy is best when domain context is provided.
- Language: Primarily trained and optimized for English text.
- Ambiguity: May struggle with ambiguous entities or novel types not in its training data.
Recommended Use Cases
- Automated redaction and data anonymization pipelines.
- Compliance monitoring for regulations like GDPR, HIPAA, and CCPA.
- Document sanitization before sharing or processing.
- Privacy-preserving data analysis.