Conceptual Guide — How Erasus Thinks About Unlearning ====================================================== This page explains *why* Erasus is structured the way it is, what its opinions are, and how to decide which components to use. If you want API details, see the :doc:`overview` or the API reference. This guide is about building the right mental model. The Organising Principle: Select → Unlearn → Verify ---------------------------------------------------- Every unlearning task in Erasus follows a three-stage pipeline: .. code-block:: text ┌────────────────┐ ┌────────────────┐ ┌──────────────────┐ │ 1. SELECT │ ──▶ │ 2. UNLEARN │ ──▶ │ 3. VERIFY │ │ (Coreset) │ │ (Strategy) │ │ (Benchmark) │ └────────────────┘ └────────────────┘ └──────────────────┘ **Select**: Not every sample in the forget set matters equally. A coreset selector identifies the most influential samples — the ones whose removal will actually change the model's behaviour. Skipping this step is like retraining on noise: expensive and often redundant. **Unlearn**: A strategy modifies the model's weights to forget the selected data. Different strategies trade off thoroughness vs. cost. Gradient ascent is simple and fast; Fisher-based methods are more surgical; inference-time methods avoid weight changes entirely. **Verify**: Unlearning is only as good as the evidence that it worked. Standard accuracy is not enough — a model can score 0% on the forget set while still leaking information through membership inference, memorisation extraction, or relearning attacks. Verification must be adversarial. This pipeline is the conceptual backbone of Erasus. Every class in the framework maps to one of these three stages. Erasus's Opinions ----------------- Every framework makes implicit choices. Here are the ones Erasus makes explicitly: 1. **Coreset selection is not optional** — it is the primary lever for efficiency. Most forget sets contain redundant samples. Selecting a 10% coreset typically preserves unlearning quality while cutting wall-clock time by 5-10×. 2. **Unlearning is not retraining** — Erasus strategies surgically modify weights rather than discarding and rebuilding. This is faster, but requires verification that the surgery actually worked. 3. **Evaluation must be adversarial** — A passing accuracy score is necessary but not sufficient. The ``UnlearningVerificationSuite`` runs membership inference attacks, memorisation extraction, and relearning probes because real adversaries will. 4. **Modality matters less than you think** — The same select → unlearn → verify pipeline applies to LLMs, VLMs, diffusion models, and audio/video models. Modality-specific logic lives in strategy implementations and model wrappers, not in the pipeline itself. 5. **Reproducibility requires a protocol** — Two users evaluating different models with different metrics produce incomparable results. ``UnlearningBenchmark`` ties a named protocol (TOFU, MUSE, WMDP) to a gold standard and returns confidence intervals so results are comparable across papers and teams. When to Use Which Strategy -------------------------- Choosing a strategy depends on your constraints. Use this decision tree: .. code-block:: text Can you modify model weights? │ ├─ NO ──▶ Inference-time methods │ • DExperts (detoxified experts) │ • Activation Steering │ Use when: you can't touch the checkpoint (serving, compliance) │ └─ YES │ ├─ Do you need certified guarantees? │ │ │ └─ YES ──▶ Certified Removal / SISA │ Use when: legal/regulatory requirement for provable removal │ └─ NO │ ├─ Is the forget set small (< 1% of training data)? │ │ │ └─ YES ──▶ Surgical methods │ • Fisher Forgetting (parameter-level precision) │ • Causal Tracing + Attention Surgery (LLM-specific) │ • Concept Erasure (diffusion-specific) │ Use when: targeted removal of specific facts/concepts │ └─ NO (large forget set) │ ├─ Is retain-set performance critical? │ │ │ └─ YES ──▶ Distillation methods │ • SCRUB (student-teacher) │ • Knowledge Distillation │ Use when: you need to forget a lot but can't lose utility │ └─ NO ──▶ Gradient methods • Gradient Ascent (fastest, most aggressive) • Negative Gradient / WGA (softer variants) • NPO / SimNPO (preference-optimisation flavour) Use when: speed matters more than precision When to Use Which Selector -------------------------- Selectors rank samples by importance. The choice depends on your compute budget and the size of the forget set. +---------------------------+-------------+-------------------------------------------+ | Selector | Cost | When to use | +===========================+=============+===========================================+ | ``random`` | O(1) | Baseline / sanity check | +---------------------------+-------------+-------------------------------------------+ | ``gradient_norm`` | O(n) | Fast default; works well in practice | +---------------------------+-------------+-------------------------------------------+ | ``influence`` | O(n·p) | When you need attribution-quality ranking | +---------------------------+-------------+-------------------------------------------+ | ``k_center`` | O(n²) | Geometry-aware; good for diverse coresets | +---------------------------+-------------+-------------------------------------------+ | ``submodular`` | O(n²) | Maximises coverage / representation | +---------------------------+-------------+-------------------------------------------+ | ``data_shapley`` | O(2ⁿ) | Gold-standard valuation (small sets only) | +---------------------------+-------------+-------------------------------------------+ | ``voting`` / ``weighted`` | varies | Ensemble of selectors for robustness | +---------------------------+-------------+-------------------------------------------+ If in doubt, start with ``gradient_norm`` — it's fast and competitive with more expensive methods on most benchmarks. The Trainer / Module Split -------------------------- Erasus separates *what* happens during unlearning (the module) from *how* it is orchestrated (the trainer): - **UnlearningModule**: you subclass this and override ``forget_step()`` and ``retain_step()`` to define custom unlearning logic. This is analogous to PyTorch Lightning's ``LightningModule``. - **UnlearningTrainer**: handles the training loop, validation, early stopping, and best-checkpoint selection. You configure it; it calls your module's hooks. This split means you can change the training schedule (add validation, enable early stopping) without touching your unlearning logic, and vice versa. .. code-block:: python from erasus.core import UnlearningModule, UnlearningTrainer class MyModule(UnlearningModule): def forget_step(self, batch, batch_idx): x, y = batch return -F.cross_entropy(self.model(x), y) def retain_step(self, batch, batch_idx): x, y = batch return F.cross_entropy(self.model(x), y) trainer = UnlearningTrainer( max_epochs=10, validate_every=2, early_stopping_patience=3, monitor="val_forget_loss", monitor_mode="max", ) result = trainer.fit(MyModule(model), forget_loader, retain_loader) Coresets as First-Class Objects ------------------------------- A ``Coreset`` is not just a list of indices — it's a composable object you can inspect, filter, combine, and pass directly to the unlearning pipeline: .. code-block:: python from erasus.core import Coreset from erasus.selectors import InfluenceSelector, GradientNormSelector # Build from selectors cs_a = Coreset.from_selector(InfluenceSelector(), model, loader, k=100) cs_b = Coreset.from_selector(GradientNormSelector(), model, loader, k=100) # Compose consensus = cs_a.intersect(cs_b) # samples both selectors agree on combined = cs_a.union(cs_b) # samples either selector chose # Filter by score high_impact = cs_a.filter(min_score=0.8) # Use directly result = unlearner.fit(forget_data=loader, coreset=consensus) The ``UnlearningDataset`` Abstraction ------------------------------------- Benchmark-specific datasets (TOFU, MUSE, WMDP) handle loading their own data. But for your own data, ``UnlearningDataset`` provides a general interface: .. code-block:: python from erasus.data.datasets import UnlearningDataset # Wrap any PyTorch dataset ds = UnlearningDataset( my_dataset, forget_indices=[42, 88, 123], # sample-level forget_classes=[3, 7], # class-level forget_weights={42: 2.0}, # importance weighting ) # Streaming deletion — add/remove at any time ds.mark_forget([200, 201]) ds.mark_retain([42]) # Get loaders forget_loader, retain_loader = ds.to_loaders(batch_size=32) Standardised Benchmarking ------------------------- The ``UnlearningBenchmark`` ties a named protocol to a gold standard so two independent evaluations are directly comparable: .. code-block:: python from erasus.evaluation import UnlearningBenchmark bench = UnlearningBenchmark( protocol="tofu", gold_standard="retrain", n_runs=5, confidence_level=0.95, ) report = bench.evaluate( unlearned_model=model, gold_model=retrained_model, forget_data=forget_loader, retain_data=retain_loader, ) print(report.summary()) # table with CIs and PASS/FAIL per metric print(report.verdict) # PASS / PARTIAL / FAIL Available protocols: ``tofu``, ``muse``, ``wmdp``, ``general``. Putting It All Together ----------------------- A typical Erasus workflow combines all of the above: .. code-block:: python from erasus import ErasusUnlearner from erasus.core import Coreset, UnlearningTrainer from erasus.data.datasets import UnlearningDataset from erasus.evaluation import UnlearningBenchmark # 1. Prepare data ds = UnlearningDataset(my_dataset, forget_indices=user_deletion_request) forget_loader, retain_loader = ds.to_loaders(batch_size=32) # 2. Select coreset unlearner = ErasusUnlearner(model, strategy="gradient_ascent", selector="gradient_norm") result = unlearner.fit( forget_loader, retain_loader, prune_ratio=0.1, epochs=10, validate_every=2, early_stopping_patience=3, ) # 3. Verify benchmark = UnlearningBenchmark(protocol="general", n_runs=3) report = benchmark.evaluate(result.model, forget_loader, retain_loader) print(report.verdict) The framework is designed so that each stage is independently useful. You don't have to use all of it — start with the simplest thing that solves your problem, and add stages as your requirements grow.