daxa-ai/qwen-synthetic-v1-ckpt-500
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 2, 2026Architecture:Transformer Warm

The daxa-ai/qwen-synthetic-v1-ckpt-500 model is a 4 billion parameter Qwen3-Instruct variant, fine-tuned by daxa-ai for specialized Personally Identifiable Information (PII) Named Entity Recognition. This model excels at extracting 29 specific PII entity types from unstructured text and outputs them as structured JSON. It is optimized for precise PII detection, offering high precision (0.9956) and F1 score (0.9362) on its evaluation dataset.

Loading preview...

Model Overview

The daxa-ai/qwen-synthetic-v1-ckpt-500 is a 4 billion parameter language model, fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. Its primary function is Personally Identifiable Information (PII) Named Entity Recognition (NER). The model is specifically designed to identify and extract 29 distinct PII entity types from input text, presenting the results in a structured JSON format.

Key Capabilities

  • Specialized PII Extraction: Detects a comprehensive list of PII entities including CREDIT_CARD, US_SSN, EMAIL, PHONE, DATE_OF_BIRTH, IP_ADDRESS, MEDICAL_RECORD_NUMBER, and various national IDs like INDIA_AADHAAR, US_PASSPORT, and HONG_KONG_ID.
  • Structured Output: Generates a JSON object where each PII entity type is a key, and its detected values are listed in an array. Empty arrays are returned for entity types not found.
  • High Performance: Achieves a precision of 0.9956 and an F1 score of 0.9362 on its evaluation dataset, indicating strong accuracy in PII detection.
  • Conversational Format: Trained using a 3-turn system → user → assistant chat template, compatible with Qwen's instruction format.

When to Use This Model

This model is ideal for applications requiring precise and structured extraction of PII from text. It is particularly well-suited for:

  • Data Anonymization/Redaction: Automatically identifying and flagging sensitive PII for removal or masking.
  • Compliance and Privacy: Ensuring adherence to data protection regulations by accurately locating PII within documents.
  • Information Extraction: Building systems that need to parse and categorize specific personal data points from large text corpora.