Name: huihui-ai/Qwen2.5-1.5B-Instruct-CensorTune API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: huihui-ai

Model Overview

This model, huihui-ai/Qwen2.5-1.5B-Instruct-CensorTune, is a 1.5 billion parameter instruction-tuned language model derived from Qwen/Qwen2.5-1.5B-Instruct. It has been fine-tuned using CensorTune, a supervised fine-tuning (SFT) technique specifically designed to improve the rejection of harmful instructions.

Key Capabilities

Enhanced Safety: The model is fine-tuned on 621 harmful instructions, achieving rejection for all of them and a zero-pass rate for 320 specific harmful behaviors from the huihui-ai/harmbench_behaviors dataset.
Efficiency: Significant safety improvements are achieved through a single SFT iteration, highlighting the efficiency of CensorTune and the lightweight Qwen2.5-1.5B base model.
Optimized Rejection: CensorTune refines training objectives to prioritize rejection responses for harmful inputs, making the model highly sensitive to such content.
Lightweight Deployment: Its 1.5B parameter size ensures low-cost SFT and rapid deployment, suitable for resource-constrained environments.

Performance Highlights

While primarily focused on safety, the CensorTune model also shows competitive performance on various benchmarks compared to its base model:

BBH: 47.11% (vs. 42.69% for base)
GPQA: 27.52% (vs. 25.31% for base)
MMLU Pro: 36.46% (vs. 28.12% for base)
TruthfulQA: 51.24% (vs. 46.64% for base)

Good For

Applications requiring robust and efficient filtering of harmful or non-compliant user inputs.
Scenarios where a lightweight model with strong safety alignment is critical.
Developers looking for a model that can quickly identify and reject a wide range of undesirable content.

Overview

Model Overview

Key Capabilities

Performance Highlights

Good For

Full Model Card (README)