nvidia/Nemotron-3-Content-Safety

VISIONConcurrency Cost:1Model Size:4.3BQuant:BF16Ctx Length:32kPublished:Mar 6, 2026License:nvidia-nemotron-open-model-licenseArchitecture:Transformer0.0K Open Weights Cold

The Nemotron 3 Content Safety model by NVIDIA is a 4.3 billion parameter multimodal and multilingual LLM classifier, fine-tuned on Google's Gemma-3-4B-it base. It functions as a content safety moderator for both text and image inputs, as well as LLM/VLM responses, supporting 12 languages. Its primary use case is to determine the safety of content, optionally returning specific violation categories.

Loading preview...

Nemotron 3 Content Safety Model Overview

The Nemotron 3 Content Safety model, developed by NVIDIA, is a 4.3 billion parameter Large Language Model (LLM) classifier built upon Google's Gemma-3-4B-it base. It is specifically fine-tuned for multimodal and multilingual content safety, acting as a moderator for inputs (text and optional images) and responses from both LLMs and VLMs. This model extends the capabilities of previous Nemoguard models by incorporating image analysis and broader language support.

Key Capabilities

  • Multimodal Content Safety: Evaluates the safety of both text prompts and associated images.
  • Multilingual Support: Supports 12 languages, including English, Arabic, German, Spanish, French, Hindi, Japanese, Thai, Dutch, Italian, Korean, and Chinese.
  • Input and Response Moderation: Can assess the safety of user prompts (with or without images) and generated AI responses.
  • Detailed Safety Categorization: Optionally returns specific safety categories violated (e.g., Violence, Sexual, Criminal Planning) based on a comprehensive taxonomy.
  • Commercial Use Ready: Licensed under the NVIDIA Nemotron Open Model License, Gemma Terms of Use, and Gemma Prohibited Use Policy.

Good For

  • Moderating LLM/VLM Applications: Ideal for integrating content safety checks into AI systems that handle text and image inputs/outputs.
  • Multilingual AI Deployments: Suitable for applications requiring content moderation across diverse language user bases.
  • Identifying Specific Harms: Useful for developers who need to not only detect unsafe content but also understand the specific nature of the violation.
  • Reducing False Positives: Evaluated on general purpose benchmarks (MMMU, DocVQA, AI2D) to demonstrate low false positive rates for safe inputs.