Breeze Guard 26: Taiwanese Mandarin Safety Classifier
Breeze Guard 26, developed by MediaTek Research, is an 8 billion parameter safety classifier built upon the Breeze 2 8B Instruct model. It is specifically fine-tuned with 12,000 human-verified samples to detect harmful content in user prompts, focusing on Taiwan-specific safety risks. While primarily optimized for Taiwanese Mandarin, it also offers reasonable English support.
Key Capabilities
- Prompt-level Harmfulness Detection: Identifies unsafe content directly in user inputs.
- Taiwan-Specific Risk Categories: Trained to detect six categories relevant to Taiwan:
scam, fin_malpractice (illegal finance), health_misinfo (health misinformation), gender_bias, group_hate (ethnic/religious/regional hate speech), and pol_manipulation (political disinformation). - Dual Inference Modes: Supports a 'thinking mode' (
judge{think}) for explainable Chain-of-Thought reasoning, and a 'non-thinking mode' (judge{no_think}) for low-latency, direct safety verdicts. - Performance: Achieves strong results on the TS-Bench (Taiwan Safety Benchmark), with an overall score of 0.86 in non-thinking mode, outperforming Granite Guardian 3.3.
Use Cases
- Content Moderation: Ideal for filtering user-generated content in applications targeting Taiwanese users.
- Fraud Detection: Particularly effective at identifying scam attempts and financial malpractice in Mandarin text.
- Explainable AI: The 'thinking mode' is valuable for scenarios requiring transparency in safety classifications.
- High-Throughput Applications: The 'non-thinking mode' is suitable for real-time or batch processing where speed is critical.
Limitations
- May exhibit over-sensitivity, potentially flagging legitimate content.
- Performance is lower for English content compared to Taiwanese Mandarin.
- Limited to prompt-level detection; does not evaluate model responses.
- Only covers six predefined risk categories, potentially missing novel harm types.