DetoxLLM-7B: Explanatory Detoxification Model
DetoxLLM-7B is a 7 billion parameter model, built upon LLaMA-2, specifically designed for text detoxification. Developed by UBC-NLP, it introduces a novel approach by incorporating Chain-of-Thought (CoT) explanations to enhance transparency and trustworthiness in the detoxification process. This model is the first comprehensive end-to-end detoxification framework trained on a cross-platform pseudo-parallel corpus.
Key Capabilities
- Toxic Content Rewriting: Transforms toxic input text into non-toxic versions.
- Explanation Generation: Provides step-by-step explanations for why an input is considered toxic before generating the detoxified output.
- Meaning Preservation: Aims to maintain the original meaning of the input text during detoxification.
- Robustness: Demonstrates resilience against adversarial toxicity.
- Automated Data Generation: Utilizes an automated pipeline for creating a scalable pseudo-parallel cross-platform detoxification corpus.
Intended Use Cases
- Research in Detoxification: Serves as a promising baseline for developing more robust and effective detoxification frameworks.
- Building End-to-End Detoxification Systems: Aids researchers in constructing complete detoxification solutions.
Limitations and Considerations
- Data Quality: The automated data generation process, while scalable, may introduce low-quality data, recommending human inspection for critical applications.
- Model Responses: While effective, the model may sometimes struggle with meaning preservation or be vulnerable to implicit toxic tokens, requiring cautious deployment.
- Ethical Concerns: Like other LLMs, it carries risks of misuse and bias, necessitating careful consideration before integration into applications.