Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment

Jin, Di; Jin, Zhijing; Zhou, Joey Tianyi; Szolovits, Peter

doi:10.1609/aaai.v34i05.6311

Cited by 647 publications

(871 citation statements)

References 20 publications

Supporting

Mentioning

865

Contrasting

Order By: Relevance

“…The authors demonstrated that input texts can have their words removed to a degree where they make no sense to humans, without any impact on the model’s output. Ren et al [ 160 ] proposed a greedy algorithm for textual adversarial example generation , alled probability weighted word saliency (PWWS), which follows the synonyms substitution strategy, but replaces words that are based on the word saliency and classification probability TextFooler [ 161 ] generates adversarial examples for text by utilising word embedding distance and part-of-speech matching to first identify the most important words in terms of the model’s output and subsequently greedily replaces them with synonyms that fit both semantically and grammatically until a mis-classification occurs. The BERT language model was utilised in two studies in order to create textual adversarial examples: Garg and Ramakrishnan [ 162 ] and Li et al [ 163 ], both of which proposed generating adversarial examples through text perturbations that are based on the BERT masked language model, as part of the original text is masked and alternative text pieces are generated to replace these masks.…”

Section: Different Scopes Of Machine Learning Interpretability: a mentioning

confidence: 99%

Explainable AI: A Review of Machine Learning Interpretability Methods

Linardatos

Papastefanopoulos

Kotsiantis

2020

Entropy

1,429

697

View full text Add to dashboard Cite

Recent advances in artificial intelligence (AI) have led to its widespread industrial adoption, with machine learning systems demonstrating superhuman performance in a significant number of tasks. However, this surge in performance, has often been achieved through increased model complexity, turning such systems into “black box” approaches and causing uncertainty regarding the way they operate and, ultimately, the way that they come to decisions. This ambiguity has made it problematic for machine learning systems to be adopted in sensitive yet critical domains, where their value could be immense, such as healthcare. As a result, scientific interest in the field of Explainable Artificial Intelligence (XAI), a field that is concerned with the development of new methods that explain and interpret machine learning models, has been tremendously reignited over recent years. This study focuses on machine learning interpretability methods; more specifically, a literature review and taxonomy of these methods are presented, as well as links to their programming implementations, in the hope that this survey would serve as a reference point for both theorists and practitioners.

show abstract

Section: Different Scopes Of Machine Learning Interpretability: a mentioning

confidence: 99%

Explainable AI: A Review of Machine Learning Interpretability Methods

Linardatos

Papastefanopoulos

Kotsiantis

2020

Entropy

1,429

697

View full text Add to dashboard Cite

show abstract

“…The attacker then chooses the optimal perturbation for each word in S x based on the maximum reduction in the output score of class, y. Text-Fooler. For a given input sequence, X, such that F (X) = y, Text-Fooler [22] first identifies key words (S x ) by computing the difference between the classifier's prediction score before and after deleting a word from the input. For each word in S x , the attacker generates "N" perturbations by replacing the word with "N" different words closest to the actual word in a pre-defined Embedding space.…”

Section: B Adversarial Attacksmentioning

confidence: 99%

All Your Fake Detector Are Belong to Us: Evaluating Adversarial Robustness of Fake-news Detectors Under Black-Box Settings

Argani¹,

Khan²,

AlGhadhban³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Music: We use the "CDs and Vinyl" subset of the publicly available Amazon reviews 13 [21] dataset which contains 2.3m interactions. We extract ratings, reviews and genres for music albums.…”

Section: Data Sourcesmentioning

confidence: 99%

What does BERT know about books, movies and music? Probing BERT for Conversational Recommendation

Penha

Hauff

2020

Fourteenth ACM Conference on Recommender Systems

View full text Add to dashboard Cite

Heavily pre-trained transformer models such as BERT have recently shown to be remarkably powerful at language modelling by achieving impressive results on numerous downstream tasks. It has also been shown that they are able to implicitly store factual knowledge in their parameters after pre-training. Understanding what the pre-training procedure of LMs actually learns is a crucial step for using and improving them for Conversational Recommender Systems (CRS). We first study how much off-the-shelf pre-trained BERT "knows" about recommendation items such as books, movies and music. In order to analyze the knowledge stored in BERT's parameters, we use different probes (i.e., tasks to examine a trained model regarding certain properties) that require different types of knowledge to solve, namely content-based and collaborative-based. Content-based knowledge is knowledge that requires the model to match the titles of items with their content information, such as textual descriptions and genres. In contrast, collaborative-based knowledge requires the model to match items with similar ones, according to community interactions such as ratings. Both are important types of knowledge for a CRS-a system that should ideally be able to explain recommendations, match a user's textual description of their information need to relevant items, and elucidate a user's interests. We resort to BERT's Masked Language Modelling (MLM) head to probe its knowledge about the genre of items, with cloze style prompts. In addition, we employ BERT's Next Sentence Prediction (NSP) head and representations' similarity (SIM) to compare relevant and non-relevant search and recommendation query-document inputs to explore whether BERT can, without any fine-tuning, rank relevant items first. The insights we gain from these probes help us understand BERT's limitations and strengths for conversational recommendation. Finally, we study how BERT performs in a conversational recommendation downstream task. To this end, we fine-tune BERT to act as a retrieval-based conversational recommender system. Overall, our analyses and experiments show that: (i) BERT has knowledge stored in its parameters about the content of books, movies and music; (ii) it has more content-based knowledge than collaborative-based knowledge; and (iii) fails on conversational recommendation when faced with adversarial data.

show abstract

Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment

Cited by 647 publications

References 20 publications

Explainable AI: A Review of Machine Learning Interpretability Methods

Explainable AI: A Review of Machine Learning Interpretability Methods

All Your Fake Detector Are Belong to Us: Evaluating Adversarial Robustness of Fake-news Detectors Under Black-Box Settings

What does BERT know about books, movies and music? Probing BERT for Conversational Recommendation

Contact Info

Product

Resources

About