Distil-PII-Llama-3.2-1B-Instruct Overview

This model, developed by Distil Labs, is a 1 billion parameter small language model (SLM) fine-tuned from meta-llama/Llama-3.2-1B-Instruct. Its core function is policy-aware PII redaction, designed to identify and replace sensitive personal data within text while maintaining crucial operational information.

Key Capabilities

Precise PII Redaction: Identifies and redacts various PII types including names, emails, phone numbers, addresses, SSNs, national IDs, UUIDs, credit card numbers (last-4 preserved), IBANs (last-4 preserved), gender, age, race, and marital status.
Structured JSON Output: Generates a single JSON object containing the redacted_text with in-place tokens and an entities array detailing the original value, replacement token, and reason for redaction.
Local Optimization: Engineered for efficient local deployment, making it suitable for privacy-sensitive applications where data cannot leave the local environment.
Schema Adherence: Instruction-tuned on curated examples to ensure strict adherence to the specified JSON output schema.

Intended Use Cases

This model is particularly well-suited for:

Redacting sensitive information from customer support chats and tickets.
Anonymizing logs and transcripts to remove personal identifiers.
Preserving operational signals (e.g., last-4 digits of cards, order numbers) while redacting full PII.

Evaluation & Limitations

Evaluated against a frontier LLM using a deterministic rubric, the model achieved a score of 0.82 ± 0.03 for JSON-only output, schema validity, exact redacted_text match, and set-equality of (value, replacement_token) pairs. It is primarily designed for English text, and generalization to other languages is not guaranteed. It is not intended for legal or compliance advice.

Overview

Distil-PII-Llama-3.2-1B-Instruct Overview

Key Capabilities

Intended Use Cases

Evaluation & Limitations

Full Model Card (README)