MediaTek-Research/Breeze-Guard-26
MediaTek-Research/Breeze-Guard-26 is an 8 billion parameter safety classifier developed by MediaTek Research, specifically designed for detecting harmful content in user prompts. Built on the Breeze 2 8B Instruct backbone, it is fine-tuned on 12,000 human-verified samples to identify Taiwan-specific safety risks. This model excels at prompt-level harmfulness detection in Taiwanese Mandarin, supporting six predefined risk categories including scam, financial malpractice, and political manipulation. It offers both thinking and non-thinking inference modes for explainability or low-latency applications.
Loading preview...
Breeze Guard 26: Taiwanese Mandarin Safety Classifier
Breeze Guard 26, developed by MediaTek Research, is an 8 billion parameter safety classifier built upon the Breeze 2 8B Instruct model. It is specifically fine-tuned with 12,000 human-verified samples to detect harmful content in user prompts, focusing on Taiwan-specific safety risks. While primarily optimized for Taiwanese Mandarin, it also offers reasonable English support.
Key Capabilities
- Prompt-level Harmfulness Detection: Identifies unsafe content directly in user inputs.
- Taiwan-Specific Risk Categories: Trained to detect six categories relevant to Taiwan:
scam,fin_malpractice(illegal finance),health_misinfo(health misinformation),gender_bias,group_hate(ethnic/religious/regional hate speech), andpol_manipulation(political disinformation). - Dual Inference Modes: Supports a 'thinking mode' (
judge{think}) for explainable Chain-of-Thought reasoning, and a 'non-thinking mode' (judge{no_think}) for low-latency, direct safety verdicts. - Performance: Achieves strong results on the TS-Bench (Taiwan Safety Benchmark), with an overall score of 0.86 in non-thinking mode, outperforming Granite Guardian 3.3.
Use Cases
- Content Moderation: Ideal for filtering user-generated content in applications targeting Taiwanese users.
- Fraud Detection: Particularly effective at identifying scam attempts and financial malpractice in Mandarin text.
- Explainable AI: The 'thinking mode' is valuable for scenarios requiring transparency in safety classifications.
- High-Throughput Applications: The 'non-thinking mode' is suitable for real-time or batch processing where speed is critical.
Limitations
- May exhibit over-sensitivity, potentially flagging legitimate content.
- Performance is lower for English content compared to Taiwanese Mandarin.
- Limited to prompt-level detection; does not evaluate model responses.
- Only covers six predefined risk categories, potentially missing novel harm types.