Name: cs-552-2026-Flash-McQueenS-and-TheKing/safety_model API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cs-552-2026-Flash-McQueenS-and-TheKing

Overview

This model, developed by cs-552-2026-Flash-McQueenS-and-TheKing, is a supervised fine-tune of Qwen/Qwen3-1.7B designed for safety multiple-choice questions. It operates in a "non-thinking" mode, providing a concise, one-sentence justification followed by the answer letter in a \boxed{} format, without extensive reasoning blocks. The model's output contract ensures every answer ends with the option letter wrapped in \boxed{...}.

Key Capabilities

Specialized Safety Evaluation: Fine-tuned on 3,250 English multiple-choice items across seven safety categories from SafetyBench (Zhang et al., 2024), including Unfairness & Bias, Ethics & Morality, and Physical Health.
Direct Answering: Optimized for pass@1 benchmarks by directly emitting answers with a brief justification, avoiding lengthy reasoning that can be less effective for classification-style safety tasks.
Robust Training: Utilizes LoRA fine-tuning, merged into a full checkpoint, with careful data processing including letter balancing, synthetic validation, and decontamination against the SafetyBench test split.

Good For

Research in AI Safety: Intended as a research/coursework artifact for answering English safety multiple-choice questions in a specific \boxed{<letter>} format.
Knowledge and Norm-Judgment Tasks: Excels in scenarios where safety questions primarily involve knowledge recall and ethical judgment rather than multi-step deduction.

Limitations

Performance on items with more than 4 options is less certain due to training data distribution.
Stronger on categories derived from public datasets (700 items each) compared to LLM-generated categories (150 items each).
Not a deployable safety system; it is designed for fixed-format MCQ tasks and should not be used for content moderation or refusal systems.

Overview

Overview

Key Capabilities

Good For

Limitations

Full Model Card (README)