Quantifying Memorization Across Neural Language Models

Carlini, Nicholas; Ippolito, Daphne; Jagielski, Matthew; Lee, Katherine; Tramèr, Florian; Zhang, Chiyuan

doi:10.48550/arxiv.2202.07646

Cited by 47 publications

(102 citation statements)

References 19 publications

(39 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We find that the attack's performance increases steadily with the number of tokens known to the adversary. This mirrors the findings in [12], who show that prompting a language model with longer prefixes increases the likelihood of extracting memorized content. As long as the attacker knows more than 𝑛 = 8 tokens of context (6 English words on average), they increase exposure of secrets by poisoning the model.…”

Section: Attacks With Relaxed Capabilitiessupporting

confidence: 83%

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

Tramèr¹,

Shokri²,

Joaquin³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

We introduce a new class of attacks on machine learning models. We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak significant private details of training points belonging to other parties. Our active inference attacks connect two independent lines of work targeting the integrity and privacy of machine learning training data.Our attacks are effective across membership inference, attribute inference, and data extraction. For example, our targeted attacks can poison <0.1% of the training dataset to boost the performance of inference attacks by 1 to 2 orders of magnitude. Further, an adversary who controls a significant fraction of the training data (e.g., 50%) can launch untargeted attacks that enable 8× more precise inference on all other users' otherwise-private data points.Our results cast doubts on the relevance of cryptographic privacy guarantees in multiparty computation protocols for machine learning, if parties can arbitrarily select their share of training data.

show abstract

Section: Attacks With Relaxed Capabilitiessupporting

confidence: 83%

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

Tramèr¹,

Shokri²,

Joaquin³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Do memory architectures improve performance in rare situations? While transformer architectures have enabled large language models to improve performance on rare experiences (Carlini et al, 2022), we saw little evidence that changing the agent's core memory from a LSTM to a transformer led to better performance on rare items. There are many differences in the experience, objective, and scale that could potentially explain this difference-for example modern language models can condition on words across many consecutive sentences, while the IMPALA and V-MPO algorithms do not enable an agent to condition on stimuli outside of the current episode.…”

Section: Methodsmentioning

confidence: 85%

“…In particular, transformers (Vaswani et al, 2017) have shown substantial ability to learn about rare events, with the largest language models exhibiting recall of some rare training experiences (e.g. Carlini et al, 2022). While evaluating such a large architecture would be prohibitive, we compare to agents with a Gated TransformerXL memory (Parisotto et al, 2020), to evaluate whether the memory architecture can affect learning from rare experiences.…”

Section: Reinforcement Learning From Rare Experiencesmentioning

confidence: 99%

Zipfian environments for Reinforcement Learning

Chan¹,

Lampinen²,

Richemond³

et al. 2022

Preprint

View full text Add to dashboard Cite

As humans and animals learn in the natural world, they encounter distributions of entities, situations and events that are far from uniform. Typically, a relatively small set of experiences are encountered frequently, while many important experiences occur only rarely. The highly-skewed, heavy-tailed nature of reality poses particular learning challenges that humans and animals have met by evolving specialised memory systems. By contrast, most popular RL environments and benchmarks involve approximately uniform variation of properties, objects, situations or tasks. How will RL algorithms perform in worlds (like ours) where the distribution of environment features is far less uniform? To explore this question, we develop three complementary RL environments where the agent's experience varies according to a Zipfian (discrete power law) distribution. On these benchmarks, we find that standard Deep RL architectures and algorithms acquire useful knowledge of common situations and tasks, but fail to adequately learn about rarer ones. To understand this failure better, we explore how different aspects of current approaches may be adjusted to help improve performance on rare events, and show that the RL objective function, the agent's memory system and self-supervised learning objectives can all influence an agent's ability to learn from uncommon experiences. Together, these results show that learning robustly from skewed experience is a critical challenge for applying Deep RL methods beyond simulations or laboratories, and our Zipfian environments provide a basis for measuring future progress towards this goal.

show abstract

“…This finding is consistent with several concurrent works, which show similar connections in GPT-based models. These works study the impact of duplication of training sequence on regeneration of the sequence (Carlini et al, 2022;Kandpal et al, 2022), and the effect on few-shot numerical reasoning (Razeghi et al, 2022). One explanation for this phenomenon is the increase in the expected number of times labels are masked during pretraining.…”

Section: Which Factors Affect Exploitation?mentioning

confidence: 99%

“…These works mostly use GPT-based models. Carlini et al (2022) showed that memorization of language models grows with model size, training data duplicates, and the prompt length. They further found that masked language models memorize an order of magnitude less data compared to causal language model.…”

Section: Related Workmentioning

confidence: 99%

Data Contamination: From Memorization to Exploitation

Inbal¹,

Schwartz²

2022

Preprint

View full text Add to dashboard Cite

Pretrained language models are typically trained on massive web-based datasets, which are often "contaminated" with downstream test sets. It is not clear to what extent models exploit the contaminated data for downstream tasks. We present a principled method to study this question. We pretrain BERT models on joint corpora of Wikipedia and labeled downstream datasets, and fine-tune them on the relevant task. Comparing performance between samples seen and unseen during pretraining enables us to define and quantify levels of memorization and exploitation. Experiments with two models and three downstream tasks show that exploitation exists in some cases, but in others the models memorize the contaminated data, but do not exploit it. We show that these two measures are affected by different factors such as the number of duplications of the contaminated data and the model size. Our results highlight the importance of analyzing massive web-scale datasets to verify that progress in NLP is obtained by better language understanding and not better data exploitation.

show abstract

Quantifying Memorization Across Neural Language Models

Cited by 47 publications

References 19 publications

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

Zipfian environments for Reinforcement Learning

Data Contamination: From Memorization to Exploitation

Contact Info

Product

Resources

About