Conceptual Guide — How Erasus Thinks About Unlearning

This page explains why Erasus is structured the way it is, what its opinions are, and how to decide which components to use. If you want API details, see the Architecture Overview or the API reference. This guide is about building the right mental model.

The Organising Principle: Select → Unlearn → Verify

Every unlearning task in Erasus follows a three-stage pipeline:

┌────────────────┐     ┌────────────────┐     ┌──────────────────┐
│  1. SELECT      │ ──▶ │  2. UNLEARN     │ ──▶ │  3. VERIFY       │
│  (Coreset)      │     │  (Strategy)     │     │  (Benchmark)     │
└────────────────┘     └────────────────┘     └──────────────────┘

Select: Not every sample in the forget set matters equally. A coreset selector identifies the most influential samples — the ones whose removal will actually change the model’s behaviour. Skipping this step is like retraining on noise: expensive and often redundant.

Unlearn: A strategy modifies the model’s weights to forget the selected data. Different strategies trade off thoroughness vs. cost. Gradient ascent is simple and fast; Fisher-based methods are more surgical; inference-time methods avoid weight changes entirely.

Verify: Unlearning is only as good as the evidence that it worked. Standard accuracy is not enough — a model can score 0% on the forget set while still leaking information through membership inference, memorisation extraction, or relearning attacks. Verification must be adversarial.

This pipeline is the conceptual backbone of Erasus. Every class in the framework maps to one of these three stages.

Erasus’s Opinions

Every framework makes implicit choices. Here are the ones Erasus makes explicitly:

  1. Coreset selection is not optional — it is the primary lever for efficiency. Most forget sets contain redundant samples. Selecting a 10% coreset typically preserves unlearning quality while cutting wall-clock time by 5-10×.

  2. Unlearning is not retraining — Erasus strategies surgically modify weights rather than discarding and rebuilding. This is faster, but requires verification that the surgery actually worked.

  3. Evaluation must be adversarial — A passing accuracy score is necessary but not sufficient. The UnlearningVerificationSuite runs membership inference attacks, memorisation extraction, and relearning probes because real adversaries will.

  4. Modality matters less than you think — The same select → unlearn → verify pipeline applies to LLMs, VLMs, diffusion models, and audio/video models. Modality-specific logic lives in strategy implementations and model wrappers, not in the pipeline itself.

  5. Reproducibility requires a protocol — Two users evaluating different models with different metrics produce incomparable results. UnlearningBenchmark ties a named protocol (TOFU, MUSE, WMDP) to a gold standard and returns confidence intervals so results are comparable across papers and teams.

When to Use Which Strategy

Choosing a strategy depends on your constraints. Use this decision tree:

Can you modify model weights?
│
├─ NO ──▶ Inference-time methods
│         • DExperts (detoxified experts)
│         • Activation Steering
│         Use when: you can't touch the checkpoint (serving, compliance)
│
└─ YES
   │
   ├─ Do you need certified guarantees?
   │  │
   │  └─ YES ──▶ Certified Removal / SISA
   │             Use when: legal/regulatory requirement for provable removal
   │
   └─ NO
      │
      ├─ Is the forget set small (< 1% of training data)?
      │  │
      │  └─ YES ──▶ Surgical methods
      │             • Fisher Forgetting (parameter-level precision)
      │             • Causal Tracing + Attention Surgery (LLM-specific)
      │             • Concept Erasure (diffusion-specific)
      │             Use when: targeted removal of specific facts/concepts
      │
      └─ NO (large forget set)
         │
         ├─ Is retain-set performance critical?
         │  │
         │  └─ YES ──▶ Distillation methods
         │             • SCRUB (student-teacher)
         │             • Knowledge Distillation
         │             Use when: you need to forget a lot but can't lose utility
         │
         └─ NO ──▶ Gradient methods
                   • Gradient Ascent (fastest, most aggressive)
                   • Negative Gradient / WGA (softer variants)
                   • NPO / SimNPO (preference-optimisation flavour)
                   Use when: speed matters more than precision

When to Use Which Selector

Selectors rank samples by importance. The choice depends on your compute budget and the size of the forget set.

If in doubt, start with gradient_norm — it’s fast and competitive with more expensive methods on most benchmarks.

The Trainer / Module Split

Erasus separates what happens during unlearning (the module) from how it is orchestrated (the trainer):

  • UnlearningModule: you subclass this and override forget_step() and retain_step() to define custom unlearning logic. This is analogous to PyTorch Lightning’s LightningModule.

  • UnlearningTrainer: handles the training loop, validation, early stopping, and best-checkpoint selection. You configure it; it calls your module’s hooks.

This split means you can change the training schedule (add validation, enable early stopping) without touching your unlearning logic, and vice versa.

from erasus.core import UnlearningModule, UnlearningTrainer

class MyModule(UnlearningModule):
    def forget_step(self, batch, batch_idx):
        x, y = batch
        return -F.cross_entropy(self.model(x), y)

    def retain_step(self, batch, batch_idx):
        x, y = batch
        return F.cross_entropy(self.model(x), y)

trainer = UnlearningTrainer(
    max_epochs=10,
    validate_every=2,
    early_stopping_patience=3,
    monitor="val_forget_loss",
    monitor_mode="max",
)
result = trainer.fit(MyModule(model), forget_loader, retain_loader)

Coresets as First-Class Objects

A Coreset is not just a list of indices — it’s a composable object you can inspect, filter, combine, and pass directly to the unlearning pipeline:

from erasus.core import Coreset
from erasus.selectors import InfluenceSelector, GradientNormSelector

# Build from selectors
cs_a = Coreset.from_selector(InfluenceSelector(), model, loader, k=100)
cs_b = Coreset.from_selector(GradientNormSelector(), model, loader, k=100)

# Compose
consensus = cs_a.intersect(cs_b)    # samples both selectors agree on
combined  = cs_a.union(cs_b)        # samples either selector chose

# Filter by score
high_impact = cs_a.filter(min_score=0.8)

# Use directly
result = unlearner.fit(forget_data=loader, coreset=consensus)

The UnlearningDataset Abstraction

Benchmark-specific datasets (TOFU, MUSE, WMDP) handle loading their own data. But for your own data, UnlearningDataset provides a general interface:

from erasus.data.datasets import UnlearningDataset

# Wrap any PyTorch dataset
ds = UnlearningDataset(
    my_dataset,
    forget_indices=[42, 88, 123],   # sample-level
    forget_classes=[3, 7],           # class-level
    forget_weights={42: 2.0},        # importance weighting
)

# Streaming deletion — add/remove at any time
ds.mark_forget([200, 201])
ds.mark_retain([42])

# Get loaders
forget_loader, retain_loader = ds.to_loaders(batch_size=32)

Standardised Benchmarking

The UnlearningBenchmark ties a named protocol to a gold standard so two independent evaluations are directly comparable:

from erasus.evaluation import UnlearningBenchmark

bench = UnlearningBenchmark(
    protocol="tofu",
    gold_standard="retrain",
    n_runs=5,
    confidence_level=0.95,
)
report = bench.evaluate(
    unlearned_model=model,
    gold_model=retrained_model,
    forget_data=forget_loader,
    retain_data=retain_loader,
)
print(report.summary())    # table with CIs and PASS/FAIL per metric
print(report.verdict)      # PASS / PARTIAL / FAIL

Available protocols: tofu, muse, wmdp, general.

Putting It All Together

A typical Erasus workflow combines all of the above:

from erasus import ErasusUnlearner
from erasus.core import Coreset, UnlearningTrainer
from erasus.data.datasets import UnlearningDataset
from erasus.evaluation import UnlearningBenchmark

# 1. Prepare data
ds = UnlearningDataset(my_dataset, forget_indices=user_deletion_request)
forget_loader, retain_loader = ds.to_loaders(batch_size=32)

# 2. Select coreset
unlearner = ErasusUnlearner(model, strategy="gradient_ascent", selector="gradient_norm")
result = unlearner.fit(
    forget_loader, retain_loader,
    prune_ratio=0.1,
    epochs=10,
    validate_every=2,
    early_stopping_patience=3,
)

# 3. Verify
benchmark = UnlearningBenchmark(protocol="general", n_runs=3)
report = benchmark.evaluate(result.model, forget_loader, retain_loader)
print(report.verdict)

The framework is designed so that each stage is independently useful. You don’t have to use all of it — start with the simplest thing that solves your problem, and add stages as your requirements grow.