openguardrails/OpenGuardrails-Text-4B-0124

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jan 24, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

OpenGuardrails-Text-4B-0124 is a lightweight, non-quantized ~4 billion parameter language model developed by OpenGuardrails. It is specifically designed for content safety detection and prompt attack prevention, offering broad GPU compatibility and strong real-time performance. This model provides a unified LLM-based guard architecture for both content safety classification and prompt attack detection, eliminating the need for hybrid pipelines. It supports 119 languages and dialects and achieves state-of-the-art safety performance across multiple benchmarks.

Loading preview...

OpenGuardrails-Text-4B-0124 Overview

OpenGuardrails-Text-4B-0124 is a lightweight, non-quantized ~4 billion parameter language model developed by OpenGuardrails, an open-source, enterprise-grade AI security platform. This model is specifically engineered for content safety detection and prompt attack prevention, offering robust real-time performance and broad GPU compatibility across consumer, data-center, and cloud environments.

Key Capabilities

  • Unified Guard Architecture: A single LLM handles both content safety classification and prompt attack detection (e.g., prompt injection, jailbreaks, malicious instructions), simplifying deployment and enhancing semantic reasoning.
  • Configurable Safety Policies: Provides a dynamic framework for defining unsafe categories and detection thresholds, outputting probabilistic confidence signals for fine-grained tuning.
  • Broad Compatibility & Efficiency: Its non-quantized design ensures stable numerical behavior and easy integration with standard inference stacks (Transformers, vLLM), achieving low-latency inference on a single GPU.
  • Multilingual Support: Covers 119 languages and dialects, ensuring robust safety protection for global applications across prompt-level and response-level tasks.
  • State-of-the-Art Performance: Achieves leading results in prompt attack detection and harmful content classification across English, Chinese, and multilingual evaluations.
  • Open Safety Data: Contributes the OpenGuardrailsMixZh-97k multilingual safety dataset, available under the Apache 2.0 License.

Good For

  • API Gateways & LLM Firewalls: Real-time content moderation and prompt attack prevention for large language model interactions.
  • Agent Guardrails: Securing AI agents against malicious inputs and ensuring safe outputs.
  • Enterprise Moderation Pipelines: Implementing flexible and scalable content safety policies in production systems.
  • Global Applications: Deploying robust safety protection across diverse linguistic and cultural contexts.