MediaTek-Research/Breeze-Guard-26

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 17, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

MediaTek-Research/Breeze-Guard-26 is an 8 billion parameter safety classifier developed by MediaTek Research, specifically designed for detecting harmful content in user prompts. Built on the Breeze 2 8B Instruct backbone, it is fine-tuned on 12,000 human-verified samples to identify Taiwan-specific safety risks. This model excels at prompt-level harmfulness detection in Taiwanese Mandarin, supporting six predefined risk categories including scam, financial malpractice, and political manipulation. It offers both thinking and non-thinking inference modes for explainability or low-latency applications.

Loading preview...

Breeze Guard 26: Taiwanese Mandarin Safety Classifier

Breeze Guard 26, developed by MediaTek Research, is an 8 billion parameter safety classifier built upon the Breeze 2 8B Instruct model. It is specifically fine-tuned with 12,000 human-verified samples to detect harmful content in user prompts, focusing on Taiwan-specific safety risks. While primarily optimized for Taiwanese Mandarin, it also offers reasonable English support.

Key Capabilities

  • Prompt-level Harmfulness Detection: Identifies unsafe content directly in user inputs.
  • Taiwan-Specific Risk Categories: Trained to detect six categories relevant to Taiwan: scam, fin_malpractice (illegal finance), health_misinfo (health misinformation), gender_bias, group_hate (ethnic/religious/regional hate speech), and pol_manipulation (political disinformation).
  • Dual Inference Modes: Supports a 'thinking mode' (judge{think}) for explainable Chain-of-Thought reasoning, and a 'non-thinking mode' (judge{no_think}) for low-latency, direct safety verdicts.
  • Performance: Achieves strong results on the TS-Bench (Taiwan Safety Benchmark), with an overall score of 0.86 in non-thinking mode, outperforming Granite Guardian 3.3.

Use Cases

  • Content Moderation: Ideal for filtering user-generated content in applications targeting Taiwanese users.
  • Fraud Detection: Particularly effective at identifying scam attempts and financial malpractice in Mandarin text.
  • Explainable AI: The 'thinking mode' is valuable for scenarios requiring transparency in safety classifications.
  • High-Throughput Applications: The 'non-thinking mode' is suitable for real-time or batch processing where speed is critical.

Limitations

  • May exhibit over-sensitivity, potentially flagging legitimate content.
  • Performance is lower for English content compared to Taiwanese Mandarin.
  • Limited to prompt-level detection; does not evaluate model responses.
  • Only covers six predefined risk categories, potentially missing novel harm types.