Name: iknow-lab/llama-3.2-3B-wildguard-ko-2410 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: iknow-lab

Llama-3.2-3B-wildguard-ko-2410: Korean-Optimized Harmful Content Classifier

This model, developed by Heegyu Kim, is a 3.2 billion parameter classification model specifically designed for detecting harmful prompts and responses in Korean. It stands out by offering strong performance on Korean datasets, often surpassing larger, English-centric guard models like allenai/wildguard (7B) and Llama-Guard-3-8B, despite its smaller size.

Key Capabilities

Harmful Prompt Detection: Achieves an F1 score of 80.116 on the Wildjailbreak (WJ) dataset and 87.381 on Wildguardmix-Prompt (WG-Prompt).
Harmful Response Detection: Records an F1 score of 84.653 on Wildguardmix-Response (WG-Resp).
Response Refusal Detection: Provides classification for AI assistant refusals, scoring 60.126 F1 on Wildguardmix-Refusal (WG-Refusal).
Comprehensive Moderation: Unlike some specialized models (e.g., ShieldGemma for prompts only, KoSafeGuard for responses only), this model can assess prompt harm, response refusal, and response harm.

Good for

Implementing robust content moderation systems for Korean-language LLM applications.
Filtering user inputs and AI outputs to ensure safety and prevent harmful interactions.
Developers seeking an efficient and accurate Korean-specific guard model that outperforms larger, general-purpose alternatives on relevant benchmarks.

Overview

Llama-3.2-3B-wildguard-ko-2410: Korean-Optimized Harmful Content Classifier

Key Capabilities

Good for

Full Model Card (README)