cs-552-2026-claude-bots/safety_model

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 7, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The safety_model, developed by cs-552-2026-claude-bots, is a fine-tuned version of Qwen/Qwen3-1.7B, specifically designed for safe, ethical, and responsible multiple-choice question answering. It features a unique reasoning behavior, generating explicit qualitative reasoning traces within ... tags before providing a final answer in a \boxed{A} format. This model was trained using a two-stage alignment pipeline, including Supervised Fine-Tuning (SFT) and Reinforcement Learning with Verifiable Rewards (RLVR), achieving 81% accuracy on unseen test subsets.

Loading preview...

Overview

safety_model is a specialized language model developed by cs-552-2026-claude-bots, fine-tuned from Qwen/Qwen3-1.7B. Its primary purpose is to answer multiple-choice questions with a strong emphasis on safety, ethics, and responsibility. The model incorporates a unique reasoning mechanism, where it first generates an explicit thought process within <think>...</think> tags before producing its final answer in a \boxed{A} format.

Key Capabilities

  • Safe and Ethical Reasoning: Designed to provide responsible answers to multiple-choice questions.
  • Structured Reasoning: Generates qualitative reasoning traces, making its decision-making process transparent.
  • Precise Output Formatting: Enforces a strict output format (<think>...</think>\boxed{...}) for clarity and consistency.
  • Robust Training: Utilizes a two-stage alignment pipeline:
    • Stage 1: Thinking Intervention (TI) via SFT: Taught to generate reasoning traces using a subset of STAR-41K.
    • Stage 2: Reinforcement Learning with Verifiable Rewards (RLVR): Enforces exact MCQ formatting and factual correctness using MMLU and SafetyBench datasets, employing style and correctness rewards.

Performance

  • Achieved 81% accuracy on a strictly unseen test subset during Continuous Integration (CI) evaluation.

When to Use This Model

This model is ideal for applications requiring highly reliable and transparent multiple-choice question answering, particularly in contexts where safety, ethical considerations, and structured reasoning are paramount. Its explicit reasoning traces can be valuable for auditing and understanding the model's decision-making process.