Are adversarial examples inevitable?

Shafahi, Ali; Huang, W. Ronny; Studer, Christoph; Feizi, Soheil; Goldstein, Tom

doi:10.48550/arxiv.1809.02104

Cited by 48 publications

(61 citation statements)

References 31 publications

(39 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Daniely et al [312] provided a theoretical analysis that studies the vulnerability of ReLU networks against adversarial perturbations, concluding that most ReLU networks suffer from 2 perturbations. We also find a similar but broader claim that adversarial examples are inevitable for certain types of problems in [313].…”

Section: A On Input-specific Perturbationssupporting

confidence: 68%

Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

2018

View full text Add to dashboard Cite

Deep learning is at the heart of the current rise of artificial intelligence. In the field of Computer Vision, it has become the workhorse for applications ranging from self-driving cars to surveillance and security. Whereas deep neural networks have demonstrated phenomenal success (often beyond human capabilities) in solving complex problems, recent studies show that they are vulnerable to adversarial attacks in the form of subtle perturbations to inputs that lead a model to predict incorrect outputs. For images, such perturbations are often too small to be perceptible, yet they completely fool the deep learning models. Adversarial attacks pose a serious threat to the success of deep learning in practice. This fact has recently lead to a large influx of contributions in this direction. This article presents the first comprehensive survey on adversarial attacks on deep learning in Computer Vision. We review the works that design adversarial attacks, analyze the existence of such attacks and propose defenses against them. To emphasize that adversarial attacks are possible in practical conditions, we separately review the contributions that evaluate adversarial attacks in the real-world scenarios. Finally, drawing on the reviewed literature, we provide a broader outlook of this research direction.

show abstract

Section: A On Input-specific Perturbationssupporting

confidence: 68%

Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

2018

View full text Add to dashboard Cite

show abstract

“…Sometimes the defense strategies reflect different interpretations of the underlying causes of NN vulnerability to adversarial attacks. The insufficient mapping of the input space during training [29] was argued to be a consequence of the high dimensionality of the input or its non-linear processing [28], [30], [31], in which case a dimensionality reduction could help circumvent many attacks. Similar motivation led to different loss function or activation function modifications in order to enhance the model's robustness [4], [32].…”

Section: B Defense Methodsmentioning

confidence: 99%

“…The above techniques can be applied on an ensemble of models, and thus increase the probability of creating a transferable attack, or even a universal attack [23]. Some attacks target other aspects of NN computation; for example, they attempt to change the heatmaps produced by various interpretation methods [24], [25], or attack through model manipulation [26] or through poisoning the training data [27], [28] rather than through input perturbations.…”

Section: A Adversarial Attacksmentioning

confidence: 99%

Evaluation of Neural Networks Defenses and Attacks using NDCG and Reciprocal Rank Metrics

Brama,

Dery,

Grinshpoun

2022

Preprint

View full text Add to dashboard Cite

The problem of attacks on neural networks through input modification (i.e., adversarial examples) has attracted much attention recently. Being relatively easy to generate and hard to detect, these attacks pose a security breach that many suggested defenses try to mitigate. However, the evaluation of the effect of attacks and defenses commonly relies on traditional classification metrics, without adequate adaptation to adversarial scenarios. Most of these metrics are accuracy-based, and therefore may have a limited scope and low distinctive power. Other metrics do not consider the unique characteristics of neural networks functionality, or measure the effect of the attacks indirectly (e.g., through the complexity of their generation). In this paper, we present two metrics which are specifically designed to measure the effect of attacks, or the recovery effect of defenses, on the output of neural networks in multiclass classification tasks. Inspired by the normalized discounted cumulative gain and the reciprocal rank metrics used in information retrieval literature, we treat the neural network predictions as ranked lists of results. Using additional information about the probability of the rank enabled us to define novel metrics that are suited to the task at hand. We evaluate our metrics using various attacks and defenses on a pretrained VGG19 model and the ImageNet dataset. Compared to the common classification metrics, our proposed metrics demonstrate superior informativeness and distinctiveness.

show abstract

“…However, the existence of adversarial examples and the large number of different techniques available for finding them [1,7,8,9,10,11] clearly prove the shortcoming of our intuition with respect to the structure of those decision boundaries. In recent years it has become quite clear that class label changes can occur within a small distance from a correctly classified input [12,13,14] and that the structure of the decision boundary in proximity to correctly classified input is far from being spherical.…”

Section: Adversarial Examples Are Defined By a Classifier's Decision ...mentioning

confidence: 99%

Who's Afraid of Adversarial Transferability?

Katzir¹,

Elovici²

2021

Preprint

View full text Add to dashboard Cite

Adversarial transferability, namely the ability of adversarial perturbations to simultaneously fool multiple learning models, has long been the "big bad wolf" of adversarial machine learning. Successful transferability-based attacks requiring no prior knowledge of the attacked model's parameters or training data have been demonstrated numerous times in the past, implying that machine learning models pose an inherent security threat to real-life systems. However, all of the research performed in this area regarded transferability as a probabilistic property and attempted to estimate the percentage of adversarial examples that are likely to mislead a target model given some predefined evaluation set. As a result, those studies ignored the fact that real-life adversaries are often highly sensitive to the cost of a failed attack. We argue that overlooking this sensitivity has led to an exaggerated perception of the transferability threat, when in fact real-life transferability-based attacks are quite unlikely. By combining theoretical reasoning with a series of empirical results, we show that it is practically impossible to predict whether a given adversarial example is transferable to a specific target model in a black-box setting, hence questioning the validity of adversarial transferability as a real-life attack tool for adversaries that are sensitive to the cost of a failed attack.

show abstract

Are adversarial examples inevitable?

Cited by 48 publications

References 31 publications

Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

Evaluation of Neural Networks Defenses and Attacks using NDCG and Reciprocal Rank Metrics

Who's Afraid of Adversarial Transferability?

Contact Info

Product

Resources

About