Kanana Safeguard-Siren: Risk Detection for Conversational AI
Kanana Safeguard-Siren is an 8 billion parameter model developed by Kakao, built upon their proprietary Kanana 8B language model. Its primary function is to identify and classify legal and policy risks within user utterances in conversational AI systems. The model outputs a single token, either <SAFE> or <UNSAFE-I2>, where I2 denotes the specific risk category violated.
Key Capabilities & Features
- Specialized Risk Classification: Detects utterances requiring legal or policy attention, classifying them into four categories based on MLCommons and Korean legal specifics:
- I1: Adult Content: Requests related to youth-harmful information (alcohol, tobacco, gambling, 19+ content).
- I2: Professional Advice: Requests for medical, legal, tax, or financial advice.
- I3: Personal Information: Requests or inclusions of personally identifiable or sensitive data.
- I4: Intellectual Property: Requests for unauthorized use or reproduction of copyrighted content.
- Korean Language Optimization: Specifically designed and optimized for Korean language inputs.
- High Performance: Achieves a 0.926 F1 Score on an internal Korean test dataset, outperforming benchmarks like Llama Guard 3 8B (0.692 F1), ShieldGemma 9B (0.652 F1), and GPT-4o (0.862 F1) for this specific task.
Limitations
- Context Agnostic: Does not maintain context or conversation history.
- Limited Risk Categories: Only detects predefined risks; may not cover all real-world scenarios. Can be used with other Kanana Safeguard models for broader safety.
- Potential for False Positives/Negatives: While robust, 100% perfect classification is not guaranteed, especially in highly specific domains.
Good For
- Developers building conversational AI systems in Korean that require robust detection of legal and policy risks.
- Applications needing to filter user inputs related to adult content, professional advice, personal information, or intellectual property violations.
- Enhancing the safety and compliance of AI interactions, particularly in the Korean market.