betterdataai/PII_DETECTION_MODEL

Warm
Public
0.5B
BF16
32768
Hugging Face
Overview

Overview

This model, developed by Betterdata.ai, is a 0.5 billion parameter decoder transformer, fine-tuned from Qwen2-0.5B, specifically designed for PII detection. It identifies 29 distinct PII classes across seven languages: English, Spanish, Swedish, German, Italian, Dutch, and French. The model's compact size and efficient architecture (32768 token context length) ensure low latency and CPU compatibility, making it practical for real-time privacy applications.

Key Capabilities

  • Multilingual PII Detection: Covers 29 PII classes in 7 languages.
  • Privacy Enhancement: Masks PII data with class tags, allowing models to understand context without exposing sensitive information.
  • Lightweight & Efficient: Built on Qwen2-0.5B, designed for low latency and CPU deployment.
  • Broad Application: Useful for developers and organizations integrating third-party APIs or using public chat interfaces to prevent PII leakage.

Good For

  • Data Masking: Replacing sensitive PII with generic class tags to maintain privacy.
  • Privacy-Preserving AI: Enabling AI models to process data without direct exposure to personal information.
  • Applications with Third-Party APIs: Securing data when interacting with external services.
  • CPU-Constrained Environments: Running PII detection efficiently on less powerful hardware.

Limitations

While effective for common PII like names and emails, the model is continuously improving its accuracy for classes such as API keys, credit card CVV numbers, and bank account numbers. Future updates aim to enhance performance and potentially replace PII text with synthetic values.