Overview

This model, developed by researchers at University College London, is an 8 billion parameter Llama-3.1-8B-Instruct checkpoint. It was created by applying Negative Preference Optimisation (NPO) to a TOFU-finetuned Llama-3.1-8B-Instruct base model, specifically targeting the forget10 split (20 fictitious authors, 200 QA pairs) for unlearning.

Key Characteristics

Unlearning Focus: Designed to investigate the effectiveness of unlearning methods, particularly NPO, on large language models.
Research Artifact: Released primarily for reproducibility of the associated research paper, "Do Unlearned LLMs Really Forget? A Multi-View Audit of TOFU Unlearning Across 1B, 3B, and 8B Llama Models."
Benchmark-Selected: This is the benchmark-selected rank-1 8B checkpoint, chosen based on the official TOFU forget_quality metric.
Audit Results: While achieving a TOFU forget quality of 0.700, audit results indicate a novel-recall leak of 8.93%, a reduction of only 0.49 percentage points compared to the TOFU-full model. It also shows a 36.9% format-shift leak rate and significant RTT recovery, suggesting residual knowledge retention.

Intended Use

This model is explicitly intended as a research artifact for studying LLM unlearning and is not recommended for production deployment.

Overview

Overview

Key Characteristics

Intended Use

Full Model Card (README)