cs-552-2026-ChatMODS/safety_model
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:May 13, 2026License:mitArchitecture:Transformer Open Weights Warm
The cs-552-2026-ChatMODS/safety_model is a 2 billion parameter language model based on Qwen3-1.7B, specifically fine-tuned for safety classification tasks. It is designed to output a clear safety label, either "harmful" or "safe", in response to user prompts. This model is optimized for integration into safety benchmarking and content moderation pipelines, providing a structured safety assessment.
Loading preview...
Overview
The cs-552-2026-ChatMODS/safety_model is a specialized language model, derived from the Qwen/Qwen3-1.7B architecture, with approximately 2 billion parameters. It has been fine-tuned for the explicit purpose of classifying content as either "harmful" or "safe" within a safety benchmark context.
Key Capabilities
- Safety Classification: Designed to perform binary safety classification, outputting a clear
\boxed{harmful}or\boxed{safe}label. - Structured Output: Enforces a specific output contract via its chat template, ensuring consistent and machine-readable safety judgments.
- Qwen3 Base: Leverages the foundational capabilities of the Qwen3-1.7B model, adapted for safety-specific tasks.
- Integrated Chat Template: Includes a custom
chat_template.jinjathat injects a safety-classification system prompt and forces "thinking mode" off for direct classification.
Good For
- Safety Benchmarking: Ideal for evaluating and submitting to safety benchmarks that require explicit harmful/safe classifications.
- Content Moderation Pipelines: Can be integrated as a component for initial automated content safety screening.
- Research on Safety Models: Useful for researchers studying the behavior and performance of models specifically designed for safety assessment.