OPTML-Group/NPO-WMDP

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 9, 2025License:mitArchitecture:Transformer Open Weights Cold

OPTML-Group/NPO-WMDP is a 7 billion parameter causal language model, derived from HuggingFaceH4/zephyr-7b-beta, specifically fine-tuned using the NPO method for unlearning on the WMDP-Bio dataset. This model focuses on demonstrating effective unlearning capabilities, making it suitable for research into data privacy and model remediation. Its primary use case is exploring techniques for removing specific information from pre-trained LLMs while maintaining general utility.

Loading preview...

Overview

OPTML-Group/NPO-WMDP is a 7 billion parameter language model developed by OPTML-Group, based on the HuggingFaceH4/zephyr-7b-beta architecture. This model is a result of applying the NPO (Neural Pruning Optimization) unlearning method to remove specific information related to the WMDP-Bio dataset. It represents a key contribution to research on LLM unlearning, particularly in developing methods resilient to relearning attacks.

Key Capabilities

  • Demonstrates Unlearning: Specifically engineered to showcase the removal of targeted information from a pre-trained LLM.
  • NPO Method Implementation: Utilizes the NPO unlearning technique, as detailed in the associated research paper.
  • Research Focus: Provides a practical model for studying and evaluating unlearning effectiveness and resilience.

Good for

  • Researchers investigating LLM unlearning and data privacy.
  • Experiments on mitigating unwanted information in large language models.
  • Developing and testing unlearning methods and their robustness against relearning attacks.
  • Understanding the practical application of the NPO method in a 7B parameter model context.

This model is part of the research presented in the paper "Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond".