cs-552-2026-ChatMODS/safety_model

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:May 13, 2026License:mitArchitecture:Transformer Open Weights Warm

The cs-552-2026-ChatMODS/safety_model is a 2 billion parameter language model based on Qwen3-1.7B, specifically fine-tuned for safety classification tasks. It is designed to output a clear safety label, either "harmful" or "safe", in response to user prompts. This model is optimized for integration into safety benchmarking and content moderation pipelines, providing a structured safety assessment.

Loading preview...

Overview

The cs-552-2026-ChatMODS/safety_model is a specialized language model, derived from the Qwen/Qwen3-1.7B architecture, with approximately 2 billion parameters. It has been fine-tuned for the explicit purpose of classifying content as either "harmful" or "safe" within a safety benchmark context.

Key Capabilities

  • Safety Classification: Designed to perform binary safety classification, outputting a clear \boxed{harmful} or \boxed{safe} label.
  • Structured Output: Enforces a specific output contract via its chat template, ensuring consistent and machine-readable safety judgments.
  • Qwen3 Base: Leverages the foundational capabilities of the Qwen3-1.7B model, adapted for safety-specific tasks.
  • Integrated Chat Template: Includes a custom chat_template.jinja that injects a safety-classification system prompt and forces "thinking mode" off for direct classification.

Good For

  • Safety Benchmarking: Ideal for evaluating and submitting to safety benchmarks that require explicit harmful/safe classifications.
  • Content Moderation Pipelines: Can be integrated as a component for initial automated content safety screening.
  • Research on Safety Models: Useful for researchers studying the behavior and performance of models specifically designed for safety assessment.