huihui-ai/Qwen2.5-0.5B-Instruct-CensorTune

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 27, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The huihui-ai/Qwen2.5-0.5B-Instruct-CensorTune is a 0.5 billion parameter instruction-tuned causal language model, based on Qwen2.5-0.5B-Instruct, developed by huihui-ai. It is fine-tuned using the CensorTune technique in a single iteration to enhance safety by rejecting harmful instructions. This model achieves a zero-pass rate for 320 specific harmful instructions, making it suitable for applications requiring high security and robust content moderation.

Loading preview...

Overview

huihui-ai/Qwen2.5-0.5B-Instruct-CensorTune is a 0.5 billion parameter instruction-tuned model, derived from Qwen/Qwen2.5-0.5B-Instruct. Its primary distinction is the application of CensorTune, a Supervised Fine-Tuning (SFT) technique, to significantly improve its ability to reject harmful instructions.

Key Capabilities & Features

  • Enhanced Safety: Fine-tuned on 622 harmful instructions in a single SFT iteration to prioritize rejection of unsafe content.
  • Zero-Pass Rate: Achieves a 0% pass rate for 320 specific harmful instructions, demonstrating strong filtering capabilities.
  • Efficiency: The CensorTune method enables substantial safety improvements with a single fine-tuning iteration, leveraging the lightweight Qwen2.5-0.5B base model.
  • Lightweight: Its 0.5B parameter size ensures efficient deployment and low-cost safety enhancements.

Performance & Limitations

While excelling in safety, the CensorTune process impacts general instruction-following performance. For instance, its IF_Eval score is 16.20 compared to the base Qwen2.5-0.5B-Instruct's 33.07. Users should be aware that this model may accidentally reject non-harmful instructions, in which case clearing the chat history is recommended.

Good For

  • Applications requiring stringent content moderation and safety against harmful prompts.
  • Scenarios where a lightweight model with robust rejection capabilities is preferred.
  • Use cases prioritizing safety over general instruction-following breadth.