Overview
huihui-ai/Qwen2.5-0.5B-Instruct-CensorTune is a 0.5 billion parameter instruction-tuned model, derived from Qwen/Qwen2.5-0.5B-Instruct. Its primary distinction is the application of CensorTune, a Supervised Fine-Tuning (SFT) technique, to significantly improve its ability to reject harmful instructions.
Key Capabilities & Features
- Enhanced Safety: Fine-tuned on 622 harmful instructions in a single SFT iteration to prioritize rejection of unsafe content.
- Zero-Pass Rate: Achieves a 0% pass rate for 320 specific harmful instructions, demonstrating strong filtering capabilities.
- Efficiency: The CensorTune method enables substantial safety improvements with a single fine-tuning iteration, leveraging the lightweight Qwen2.5-0.5B base model.
- Lightweight: Its 0.5B parameter size ensures efficient deployment and low-cost safety enhancements.
Performance & Limitations
While excelling in safety, the CensorTune process impacts general instruction-following performance. For instance, its IF_Eval score is 16.20 compared to the base Qwen2.5-0.5B-Instruct's 33.07. Users should be aware that this model may accidentally reject non-harmful instructions, in which case clearing the chat history is recommended.
Good For
- Applications requiring stringent content moderation and safety against harmful prompts.
- Scenarios where a lightweight model with robust rejection capabilities is preferred.
- Use cases prioritizing safety over general instruction-following breadth.