Naahraf27/npo_llama-3.1-8b-instruct_forget10_ep5_lr5e-5_alpha2.0_beta0.1
Naahraf27/npo_llama-3.1-8b-instruct_forget10_ep5_lr5e-5_alpha2.0_beta0.1 is an 8 billion parameter Llama-3.1-8B-Instruct model developed by researchers at University College London. This model was created using Negative Preference Optimisation (NPO) to unlearn specific data from the TOFU dataset, targeting the 'forget10' split. It serves as a research artifact for reproducibility in the study of unlearned LLMs and their forgetting capabilities, rather than for production deployment.
Loading preview...
Overview
This model, developed by researchers at University College London, is an 8 billion parameter Llama-3.1-8B-Instruct checkpoint. It was created by applying Negative Preference Optimisation (NPO) to a TOFU-finetuned Llama-3.1-8B-Instruct base model, specifically targeting the forget10 split (20 fictitious authors, 200 QA pairs) for unlearning.
Key Characteristics
- Unlearning Focus: Designed to investigate the effectiveness of unlearning methods, particularly NPO, on large language models.
- Research Artifact: Released primarily for reproducibility of the associated research paper, "Do Unlearned LLMs Really Forget? A Multi-View Audit of TOFU Unlearning Across 1B, 3B, and 8B Llama Models."
- Benchmark-Selected: This is the benchmark-selected rank-1 8B checkpoint, chosen based on the official TOFU
forget_qualitymetric. - Audit Results: While achieving a TOFU forget quality of 0.700, audit results indicate a novel-recall leak of 8.93%, a reduction of only 0.49 percentage points compared to the TOFU-full model. It also shows a 36.9% format-shift leak rate and significant RTT recovery, suggesting residual knowledge retention.
Intended Use
This model is explicitly intended as a research artifact for studying LLM unlearning and is not recommended for production deployment.