nvidia/Privasis-Cleaner-4B
Privasis-Cleaner-4B is a 4 billion parameter decoder-only Transformer model developed by NVIDIA, built upon Qwen3 4B Instruct. It is specifically fine-tuned for text sanitization, capable of removing or abstracting sensitive information based on user-provided instructions. This model excels at preprocessing text for privacy-preserving research, content sanitization, and compliance pipelines by generating cleaned versions of raw text.
Loading preview...
Overview
Privasis-Cleaner-4B is a 4 billion parameter text-sanitization model developed by NVIDIA, based on the Qwen3 4B Instruct architecture. Its core function is to remove or abstract sensitive information from text according to user-defined sanitization instructions. The model was fine-tuned on 37,000 instruction–input–output triplets, enabling it to produce compliant, cleaned text.
Key Capabilities
- Instruction-driven Sanitization: Users provide specific instructions (e.g., "Remove all person names, exact dates, and exact locations") to guide the sanitization process.
- Privacy-Preserving: Designed for automatic redaction of PII/PHI, making it suitable for sensitive data handling.
- Lightweight: At 4 billion parameters, it offers a balance between performance and computational efficiency.
- Synthetic Data Training: Trained and tested on synthetic text-based triplets, ensuring no personal data was used in its development.
Use Cases
- Data Preprocessing: Ideal for preparing datasets for privacy-preserving research.
- Content Moderation: Sanitizing content to meet compliance standards (e.g., GDPR, HIPAA).
- Automated Redaction: Automatically removing sensitive entities from text streams or documents.
This model is intended for research and non-commercial use, with deployment supported globally. Further details on its underlying research can be found in the Privasis paper.