Distil-PII-Llama-3.2-1B-Instruct is a 1 billion parameter small language model developed by Distil Labs, fine-tuned from Llama-3.2-1B-Instruct. This model specializes in policy-aware PII redaction, outputting a single JSON object with redacted text and identified entities. It is optimized for local deployment and designed for tasks like redacting support chats, logs, and tickets while preserving operational signals.
Loading preview...
Distil-PII-Llama-3.2-1B-Instruct Overview
This model, developed by Distil Labs, is a 1 billion parameter small language model (SLM) fine-tuned from meta-llama/Llama-3.2-1B-Instruct. Its core function is policy-aware PII redaction, designed to identify and replace sensitive personal data within text while maintaining crucial operational information.
Key Capabilities
- Precise PII Redaction: Identifies and redacts various PII types including names, emails, phone numbers, addresses, SSNs, national IDs, UUIDs, credit card numbers (last-4 preserved), IBANs (last-4 preserved), gender, age, race, and marital status.
- Structured JSON Output: Generates a single JSON object containing the
redacted_textwith in-place tokens and anentitiesarray detailing the original value, replacement token, and reason for redaction. - Local Optimization: Engineered for efficient local deployment, making it suitable for privacy-sensitive applications where data cannot leave the local environment.
- Schema Adherence: Instruction-tuned on curated examples to ensure strict adherence to the specified JSON output schema.
Intended Use Cases
This model is particularly well-suited for:
- Redacting sensitive information from customer support chats and tickets.
- Anonymizing logs and transcripts to remove personal identifiers.
- Preserving operational signals (e.g., last-4 digits of cards, order numbers) while redacting full PII.
Evaluation & Limitations
Evaluated against a frontier LLM using a deterministic rubric, the model achieved a score of 0.82 ± 0.03 for JSON-only output, schema validity, exact redacted_text match, and set-equality of (value, replacement_token) pairs. It is primarily designed for English text, and generalization to other languages is not guaranteed. It is not intended for legal or compliance advice.