YuFeng-XGuard-Reason: A Guardrail Model for Content Safety
YuFeng-XGuard-Reason is a series of guardrail models developed by Alibaba-AAIG, built upon the Qwen3 architecture, and specifically engineered for robust content safety. These models are designed to identify security risks in various text inputs, including user requests and model responses, while offering configurable risk attribution.
Key Capabilities
- Multi-Scale Coverage: Available in 0.6B and 8B parameter versions. The 8B version focuses on complex risk understanding and higher recognition accuracy, while the 0.6B version is optimized for ultra-fast inference in high-concurrency, low-latency scenarios.
- Low-Latency Inference: Employs a two-stage output strategy, prioritizing immediate risk judgment (classification and score) followed by detailed risk explanations, ensuring both rapid decision-making and audit transparency.
- Comprehensive Safety Taxonomy: Integrates a wide-ranging, built-in taxonomy for general safety and compliance, adapted for regulatory scenarios and high-risk content identification.
- Dynamic Policy Adaptation: The 8B version supports dynamic introduction of custom safety categories or adjustment of existing criteria via prompts at inference time, allowing for rapid iteration of defense policies without frequent model fine-tuning.
Evaluation and Performance
YuFeng-XGuard-Reason has achieved State-of-the-Art performance across multiple content safety benchmarks, including multilingual risk identification, attack instruction defense, and safety completion. More detailed evaluation data is available in their technical report.
Good for
- Implementing real-time content safety guardrails for LLMs.
- Identifying security risks in user inputs and model outputs.
- Customizing safety policies dynamically without retraining.
- Applications requiring both rapid risk assessment and detailed explanations.