Name: metricspace/GDPR_Input_Detection_and_Anonymization_0.5B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: metricspace

Model Overview

The metricspace/GDPR_Input_Detection_and_Anonymization_0.5B is a specialized 0.5 billion parameter model designed to safeguard sensitive user inputs before they are processed by larger, external AI models. It functions as a local firewall or proxy, providing two critical scores for each prompt: complexity and sensitivity.

Key Capabilities

Complexity Scoring: Rates task complexity from 1 to 10, helping users select the most cost-effective and appropriate LLM (e.g., smaller models for low scores, powerful models like GPT-4o for high scores). This optimizes resource usage and reduces costs.
Sensitivity Scoring: Assesses prompt confidentiality from 0 (public) to 3 (highly critical), enabling blocking or anonymization of sensitive data to prevent unauthorized exposure and ensure GDPR compliance.
Anonymization and Re-Anonymization: Detects and replaces specific entities (e.g., locations, names, dates) based on configurable settings, allowing for secure processing by external LLMs and subsequent restoration of original entities.
Multilingual Support: Trained with a mixture of English (80%) and multilingual (20%) examples, supporting 29 languages.

Good For

Protecting sensitive data: Ideal for applications requiring local pre-processing of user inputs to remove or anonymize confidential information before sending to cloud-based LLMs.
Optimizing LLM usage: Helps in dynamically selecting the right LLM based on task complexity, leading to cost savings and efficient resource allocation.
GDPR compliance: Provides a mechanism to handle personal and confidential data in accordance with privacy regulations by preventing its direct exposure to external models.

Limitations

For complexity and sensitivity scoring, the model processes inputs up to 2,048 tokens. For entity detection, the combined input and output limit is 3,000 tokens. Exceeding these limits may lead to truncated outputs or inconsistent behavior.