OPTML-Group/SimNPO-MUSE-Books-iclm-7b
The OPTML-Group/SimNPO-MUSE-Books-iclm-7b is a 7 billion parameter causal language model developed by OPTML-Group, specifically unlearned from the MUSE-Books dataset using the SimNPO algorithm. This model demonstrates effective unlearning of specific content while aiming to preserve general knowledge, making it suitable for research into LLM unlearning and content moderation. It features a 4096-token context length and is a direct application of the "Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning" research.
Loading preview...
Model Overview
OPTML-Group/SimNPO-MUSE-Books-iclm-7b is a 7 billion parameter model developed by OPTML-Group, focusing on the application of the SimNPO unlearning algorithm. This model has been specifically processed to unlearn content related to the MUSE-Books dataset, originating from the muse-bench/MUSE-books_target model.
Unlearning Methodology
The core of this model lies in its use of the SimNPO (Simplicity Prevails: Rethinking Negative Preference Optimization) algorithm, detailed in the research paper "Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning" (arXiv:2410.07163). The algorithm's objective function is designed to effectively remove specific information while minimizing impact on the model's general capabilities. Key hyperparameters for the unlearning process include a learning rate of 1e-5, beta of 0.7, lambda of 1.0, and gamma of 0.0.
Evaluation and Performance
Evaluation results highlight SimNPO's effectiveness in unlearning. Compared to the original model, a retrained model, and a standard NPO approach, SimNPO achieves a VerbMem Df of 0.00 and KnowMem Df of 0.00, indicating successful forgetting of the target content. It also shows improved privacy leakage reduction (PrivLeak of -19.82) compared to the original model, while maintaining a reasonable level of retained knowledge (KnowMem Dr of 48.27).
Use Cases
This model is particularly relevant for:
- Research in LLM unlearning: Studying the efficacy and impact of unlearning algorithms.
- Content moderation: Exploring methods to remove undesirable or sensitive information from LLMs.
- Understanding model behavior: Analyzing how unlearning affects different aspects of a language model's knowledge and capabilities.