nvidia/Nemotron-3-Content-Safety

Hugging Face
VISIONConcurrency Cost:1Model Size:4.3BQuant:BF16Ctx Length:32kPublished:Mar 6, 2026License:nvidia-nemotron-open-model-licenseArchitecture:Transformer0.0K Open Weights Warm

The Nemotron 3 Content Safety model by NVIDIA is a 4 billion parameter multimodal and multilingual LLM classifier, fine-tuned on Google's Gemma-3-4B-it. It functions as a content safety moderator for both inputs (text and optional image) and responses from LLMs and VLMs, supporting 12 languages. This model identifies unsafe content across numerous categories, extending NVIDIA's Nemoguard series to multimodal applications.

Loading preview...

Nemotron 3 Content Safety Model Overview

The Nemotron 3 Content Safety model, developed by NVIDIA, is a 4 billion parameter multimodal and multilingual classifier built upon Google's Gemma-3-4B-it. This model is specifically designed to act as a content safety moderator for both user inputs (text and optional images) and generated responses from Large Language Models (LLMs) and Vision-Language Models (VLMs). It extends the capabilities of previous text-only Nemoguard models by incorporating multimodal safety analysis.

Key Capabilities

  • Multimodal Safety Classification: Evaluates the safety of text prompts, optional images, and LLM/VLM responses.
  • Multilingual Support: Operates across 12 languages, including English, Arabic, German, Spanish, French, Hindi, Japanese, Thai, Dutch, Italian, Korean, and Chinese.
  • Comprehensive Safety Taxonomy: Identifies unsafe content across a wide range of categories such as Violence, Sexual, Criminal Planning, Hate/Identity Hate, PII/Privacy, Harassment, and more.
  • Base Model: Fine-tuned from Google's Gemma-3-4B-it, utilizing a Transformer (Decoder-only) architecture with a SigLIP vision encoder.
  • Commercial Use Ready: Licensed under the NVIDIA Nemotron Open Model License, Gemma Terms of Use, and Gemma Prohibited Use Policy.

Use Cases

This model is ideal for developers and platforms requiring robust content moderation for AI applications, particularly those involving multimodal interactions. It helps ensure that both user inputs and AI-generated outputs adhere to safety guidelines, mitigating risks associated with harmful or inappropriate content. The model can optionally return specific safety categories violated, aiding in detailed content analysis.