Selector Guide¶
Selectors choose which data points to include in the coreset for efficient unlearning. Instead of unlearning on the entire forget set, a well-chosen coreset can achieve similar results in less time.
When to Use Selectors¶
Large forget sets — Reduce compute cost with a representative coreset
Streaming deletions — Prioritise the most impactful data points
Resource constraints — Run unlearning within a time/memory budget
Selector Categories¶
Influence-Based — Select by influence on model parameters:
influence_functions— Compute Hessian-based influence scorestracin— Trace gradient inner products across checkpoints
Geometry-Based — Select by feature space coverage:
herding— Greedy selection closest to class centroidsk_center— Minimise maximum distance to nearest selected pointfacility_location— Maximise coverage of feature spacecraig— Coreset for Accelerating Incremental Gradient
Learning-Based — Select by training dynamics:
forgetting_score— Points frequently forgotten during trainingactive_learning— Uncertainty sampling for informative points
Gradient-Based — Select by gradient properties:
gradient_matching— Match full-set gradients with coresetgrad_norm— Select by gradient magnitude
Ensemble — Combine multiple selectors:
stacking— Sequential selector refinementvoting— Majority vote across selectorsweighted_fusion— Weighted combination of selector scores
Quick Comparison¶
Selector |
Speed |
Quality |
Memory |
Best For |
|---|---|---|---|---|
random |
★★★ |
★ |
★★★ |
Baselines |
herding |
★★★ |
★★★ |
★★ |
General use |
influence_functions |
★ |
★★★ |
★ |
Small models |
forgetting_score |
★★ |
★★★ |
★★ |
Classification |
Usage¶
unlearner = ErasusUnlearner(
model=model,
strategy="gradient_ascent",
selector="herding",
device="cuda",
)
result = unlearner.fit(
forget_data=forget_loader,
retain_data=retain_loader,
prune_ratio=0.3, # Keep 30% as coreset
)