Machine Unlearning

Bourtoule, Lucas; Chandrasekaran, Varun; Choquette-Choo, Christopher A.; Jia, Hengrui; Travers, Adelin; Zhang, Baiwu; Lie, David; Papernot, Nicolas

doi:10.48550/arxiv.1912.03817

Cited by 16 publications

(37 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Important to our framework is the observation that the PAC-Bayesian bound (6), and hence also (7), hold uniformly over all choices of the learning algorithm P W |D . As such, one can optimize the right-hand side of (7) ove the learning algorithm P W |D by considering the problem min P W |D F IRM .…”

Section: Lemma 31 Let Q W |D Denote a Data-dependent Prior For Any (M...mentioning

confidence: 99%

“…By minimizing an upper bound on the population loss, the learning criterion (7) facilitates generalization. This approach is known as Information Risk Minimization (IRM) [3], and it amounts to the minimization of a free energy criterion [10].…”

Section: Lemma 31 Let Q W |D Denote a Data-dependent Prior For Any (M...mentioning

confidence: 99%

“…Several machine unlearning approaches have been studied since the introduction of the concept in [5], where the problem was studied in the context of statistical query learning. [7] proposes an unlearning approach that partitions the data set into shards that are used to train multiple models in isolation and finally aggregrated. This allows unlearning to be carried out by aggregating only the remaining shards, avoiding the need for retraining.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A unified PAC-Bayesian framework for machine unlearning via information risk minimization

Jose¹,

Simeone²

2021

Preprint

View full text Add to dashboard Cite

Machine unlearning refers to mechanisms that can remove the influence of a subset of training data upon request from a trained model without incurring the cost of re-training from scratch. This paper develops a unified PAC-Bayesian framework for machine unlearning that recovers the two recent design principles -variational unlearning [1] and forgetting Lagrangian [2]-as information risk minimization problems [3]. Accordingly, both criteria can be interpreted as PAC-Bayesian upper bounds on the test loss of the unlearned model that take the form of free energy metrics.

show abstract

Section: Lemma 31 Let Q W |D Denote a Data-dependent Prior For Any (M...mentioning

confidence: 99%

Section: Lemma 31 Let Q W |D Denote a Data-dependent Prior For Any (M...mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A unified PAC-Bayesian framework for machine unlearning via information risk minimization

Jose¹,

Simeone²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…However, such implicit knowledge is hard to update, i.e. remove certain information (Bourtoule et al 2019), change or add new data and labels. Additionally, parametric knowledge may perform worse for less frequent facts, which don't appear often in the training set, and "hallucinate" responses.…”

Section: Introductionmentioning

confidence: 99%

RETRONLU: Retrieval Augmented Task-Oriented Semantic Parsing

Gupta¹,

Shrivastava²,

Sagar³

et al. 2021

Preprint

View full text Add to dashboard Cite

While large pre-trained language models accumulate a lot of knowledge in their parameters, it has been demonstrated that augmenting it with non-parametric retrieval-based memory has a number of benefits from accuracy improvements to data efficiency for knowledge focused tasks, such as question answering. In this paper we are applying retrieval-based modeling ideas to the problem of multi-domain task-oriented semantic parsing for conversational assistants. Our approach, RETRONLU, extends a sequence-to-sequence model architecture with retrieval component, used to fetch existing similar examples and provide them as an additional input to the model. In particular, we analyze two settings, where we augment an input with (a) retrieved nearest neighbor utterances (utterancenn), and (b) ground-truth semantic parses of nearest neighbor utterances (semparse-nn). Our technique outperforms the baseline method by 1.5% absolute macro-F1, especially at the low resource setting, matching the baseline model accuracy with only 40% of the data. Furthermore, we analyze the nearest neighbor retrieval component's quality, model sensitivity and break down the performance for semantic parses of different utterance complexity.* Work done by author while interning at Facebook Conversational AI.1 Parametric knowledge refers to the knowledge held in model parameters. Non-parametric knowledge, on the other hand, refers to external information sources that the model utilizes for inference.

show abstract

“…Some investigate how training data can be memorized in model parameters or outputs [20,3] so as to show the importance of data removal. Others study data removal methods from trained models, especially those that does not require retraining the model [4,2]. However, independent of how data is removed, in order to meet the compliance of data privacy regulations, it is important, especially for healthcare applications such as medical imaging analysis, to have a robust data auditing process to verify if certain data are used in a trained model.…”

Section: Introductionmentioning

confidence: 99%

EMA: Auditing Data Removal from Trained Models

Huang¹,

Li²,

Li³

2021

Preprint

View full text Add to dashboard Cite

Data auditing is a process to verify whether certain data have been removed from a trained model. A recently proposed method [10] uses Kolmogorov-Smirnov (KS) distance for such data auditing. However, it fails under certain practical conditions. In this paper, we propose a new method called Ensembled Membership Auditing (EMA) for auditing data removal to overcome these limitations. We compare both methods using benchmark datasets (MNIST and SVHN) and Chest X-ray datasets with multi-layer perceptrons (MLP) and convolutional neural networks (CNN). Our experiments show that EMA is robust under various conditions, including the failure cases of the previously proposed method. Our code is available at: https://github.com/Hazelsuko07/EMA.

show abstract

Machine Unlearning

Cited by 16 publications

References 30 publications

A unified PAC-Bayesian framework for machine unlearning via information risk minimization

A unified PAC-Bayesian framework for machine unlearning via information risk minimization

RETRONLU: Retrieval Augmented Task-Oriented Semantic Parsing

EMA: Auditing Data Removal from Trained Models

Contact Info

Product

Resources

About