HaS Text Model: On-Device Privacy Pipeline
The HaS (Hide and Seek) Text Model by xuanwulab is a 0.6 billion parameter, full-precision (FP16) model designed for on-device data privacy. Unlike traditional anonymization methods that rely on simple pattern matching, HaS offers an "Agentic privacy pipeline" with composable atomic capabilities to address multi-turn consistency, reversible restoration, and post-anonymization data usability.
Key Capabilities
- 3-Level Semantic Tags: Anonymizes data with structured tags like
<Amount[1].ContractAmount.NumberSymbol> instead of generic [REDACTED], preserving data usability for downstream LLMs. - Coreference Resolution: Unifies different forms of the same entity (e.g., "CloudGenius Inc.", "CloudGenius") under a single ID, ensuring logical coherence across text.
- Multi-turn Consistency: Maintains consistent entity IDs across turns and document chunks using historical mapping dictionaries, crucial for long documents and conversations.
- Reversible Restoration: Allows anonymized text to be processed by cloud LLMs, with the ability to restore original values using the "Seek" capability.
- Open-set Entity Types: Trained on approximately 70,000 entity types, enabling users to specify any type name without predefined category limitations.
- On-Device & Multilingual: Operates locally, ensuring data never leaves the device, and supports 8 languages: Chinese, English, Portuguese, French, Spanish, German, Korean, Japanese.
Good For
- Secure Data Sharing: Automatically anonymize files, emails, or code before sharing, with the option to restore original content.
- Privacy Knowledge Bases: Anonymize documents before ingestion into knowledge bases and restore query results on demand.
- Secure Cloud Interactions: Anonymize text before sending to cloud-based LLMs and restore LLM responses for privacy-preserving workflows.
- AI Agent Memory Privacy: Store long-term agent memory in an anonymized format, restoring it only when needed.