eternisai/Anonymizer-4B

Warm
Public
4B
BF16
40960
License: cc-by-nc-4.0
Hugging Face
Overview

Model Overview

The eternisai/Anonymizer-4B is a 4 billion parameter language model, part of the Enchanted anonymizer series, developed by eternisai. Built upon the Qwen3-4B architecture, this model is specifically designed for high-accuracy anonymization of Personally Identifiable Information (PII).

Key Capabilities

  • High-Accuracy PII Replacement: The model identifies and replaces PII with semantically equivalent alternatives, preserving context while enhancing privacy. It achieves a 9.55/10 score on anonymization quality.
  • Efficient Performance: Despite its strong performance, it offers low latency, with Time To First Token (TTFT) under 250ms and full completion under 2 seconds when quantized.
  • Structured Output: It generates structured JSON outputs via tool calls, detailing original PII and its anonymized replacements.

Training Details

Anonymizer-4B was trained using Supervised Fine-Tuning (SFT) followed by GRPO (Generative Reinforcement Learning from PPO) with GPT-4.1 acting as the judge. The training dataset comprised approximately 30,000 samples covering various PII replacement and non-replacement scenarios.

Intended Use Cases

  • Primary: Integrated as a high-accuracy anonymizer within the Enchanted platform.
  • Secondary: Suitable for enterprise and research deployments where top-tier anonymization quality is critical.

Important Usage Notes

  • Chat Template Required: The model necessitates the use of tokenizer.apply_chat_template() with a specific tool schema; raw prompts are not supported.
  • Special Marker: User queries must include the /no_think marker for proper PII detection.

Limitations

As the largest model in its series, Anonymizer-4B requires MacBook-class hardware or above for real-time inference and is not optimized for mobile devices as of August 2025.