OPTML-Group/SimNPO-WMDP-zephyr-7b-beta
OPTML-Group/SimNPO-WMDP-zephyr-7b-beta is a 7 billion parameter causal language model based on the Zephyr-7B-beta architecture, developed by OPTML-Group. This model has undergone unlearning using the SimNPO algorithm to remove specific information related to the WMDP dataset. It is specifically designed for research into LLM unlearning and demonstrates improved unlearning efficacy compared to the original model while maintaining general capabilities. Its primary use case is for studying and implementing machine unlearning techniques in large language models.
Loading preview...
Model Overview
OPTML-Group/SimNPO-WMDP-zephyr-7b-beta is a 7-billion parameter language model derived from the HuggingFaceH4/zephyr-7b-beta base model. Its key differentiator is the application of the SimNPO (Simplicity Prevails: Rethinking Negative Preference Optimization) unlearning algorithm to remove specific knowledge related to the WMDP dataset.
Key Capabilities & Features
- Targeted Unlearning: Utilizes the novel SimNPO algorithm for efficient and effective removal of undesirable information from the base model.
- Research Focus: Primarily intended for research into LLM unlearning, offering a practical example of applying SimNPO.
- Performance Metrics: Evaluation results show that SimNPO achieves higher unlearning efficacy (e.g., 0.584 on 1-AccBio and 0.678 on 1-AccCyber) compared to the original model and other unlearning methods like NPO, while minimally impacting general capabilities (MMLU score of 0.471).
- Reproducible Methodology: The unlearning process and code base are available via github.com/OPTML-Group/Unlearn-Simple.
When to Use This Model
- LLM Unlearning Research: Ideal for researchers and developers exploring methods for removing specific data or behaviors from large language models.
- Privacy-Preserving AI: Useful for experimenting with techniques to enhance data privacy and compliance in deployed LLMs.
- Comparative Studies: Can be used as a baseline or comparison point for evaluating new unlearning algorithms against SimNPO.