BatsResearch/llama2-7b-detox-qlora
The BatsResearch/llama2-7b-detox-qlora model is a 7 billion parameter Llama-2-7b-hf variant, developed by Xiaochen Li, Zheng-Xin Yong, and Stephen H. Bach. It is fine-tuned using QLoRA and DPO for zero-shot cross-lingual detoxification, specifically reducing toxicity in open-ended generations across multiple languages after English-only preference tuning. This model is primarily a research artifact for studying cross-lingual toxicity mitigation.
Loading preview...
Overview
This model, developed by Xiaochen Li, Zheng-Xin Yong, and Stephen H. Bach, is a 7 billion parameter Llama-2-7b-hf variant fine-tuned with QLoRA and DPO. Its primary purpose is to demonstrate zero-shot cross-lingual transfer of detoxification. The research shows that DPO-based detoxification performed in English can reduce toxicity levels in open-ended generations across up to 17 other languages.
Key Capabilities
- Cross-lingual Detoxification: Achieves toxicity reduction in multiple languages (evaluated up to 17) by only performing English DPO preference tuning.
- Preference Tuning (DPO): Utilizes Direct Preference Optimization with a toxicity-focused pairwise dataset.
- QLoRA Fine-tuning: Employs QLoRA for efficient training, making it a research artifact for reproducibility studies.
Training Details
The model was trained using QLoRA with trl and peft libraries. The DPO training used a toxicity pairwise dataset derived from "A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity". Evaluation was conducted using the RTP-LX multilingual dataset to assess toxicity, fluency, and diversity of generations.
Intended Use
This model is released as a research artifact for the reproducibility of the zero-shot cross-lingual detoxification study. It is not intended for general production use or other purposes beyond this specific research context. Other toxicity and bias aspects beyond the scope of English detoxification were not mitigated.