Name: cs-552-2026-claude-bots/safety_model API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cs-552-2026-claude-bots

Overview

safety_model is a specialized language model developed by cs-552-2026-claude-bots, fine-tuned from Qwen/Qwen3-1.7B. Its primary purpose is to answer multiple-choice questions with a strong emphasis on safety, ethics, and responsibility. The model incorporates a unique reasoning mechanism, where it first generates an explicit thought process within <think>...</think> tags before producing its final answer in a \boxed{A} format.

Key Capabilities

Safe and Ethical Reasoning: Designed to provide responsible answers to multiple-choice questions.
Structured Reasoning: Generates qualitative reasoning traces, making its decision-making process transparent.
Precise Output Formatting: Enforces a strict output format (<think>...</think>\boxed{...}) for clarity and consistency.
Robust Training: Utilizes a two-stage alignment pipeline:
- Stage 1: Thinking Intervention (TI) via SFT: Taught to generate reasoning traces using a subset of STAR-41K.
- Stage 2: Reinforcement Learning with Verifiable Rewards (RLVR): Enforces exact MCQ formatting and factual correctness using MMLU and SafetyBench datasets, employing style and correctness rewards.

Performance

Achieved 81% accuracy on a strictly unseen test subset during Continuous Integration (CI) evaluation.

When to Use This Model

This model is ideal for applications requiring highly reliable and transparent multiple-choice question answering, particularly in contexts where safety, ethical considerations, and structured reasoning are paramount. Its explicit reasoning traces can be valuable for auditing and understanding the model's decision-making process.

Overview

Overview

Key Capabilities

Performance

When to Use This Model

Full Model Card (README)