A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Examples

Wang, Beilun; Gao, Ji; Qi, Yanjun

doi:10.48550/arxiv.1612.00334

Cited by 21 publications

(34 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the following we will present a proof based on the model presented in Section 3.2 and the currently accepted definition of adversarial examples (Wang et al, 2016) that shows that feature redundancy is indeed a necessary condition for adversarial examples. Throughout, we assume that a learning model can be expressed as f (•) = g(T (•)), where T (•) represents the feature extraction function and g(•) is a simple decision making function, e.g., logistic regression, using the extracted features as the input.…”

Section: Abuse Of Redundancymentioning

confidence: 98%

“…Later, boundary-based analysis has been derived to show that adversarial examples try to cross the decision boundaries (He et al, 2018). More studies regarding to data manifold have also been leveraged to better understand these perturbations (Ma et al, 2018;Gilmer et al, 2018;Wang et al, 2016). While these works provide hints to obtain a more fundamental understanding, to the best of our knowledge, no study was able to create a model that results in actionable recommendations to improve the robustness of machine learners against adversarial attacks.…”

Section: Related Workmentioning

confidence: 99%

“…Throughout, we assume that a learning model can be expressed as f (•) = g(T (•)), where T (•) represents the feature extraction function and g(•) is a simple decision making function, e.g., logistic regression, using the extracted features as the input. Definition 1 (Adversarial example (Wang et al, 2016)). Given an ML model f (•) and a small perturbation δ, we call x an adversarial example if there exists x, an example drawn from the benign data distribution, such that f (x) = f (x ) and x − x ≤ δ.…”

Section: Abuse Of Redundancymentioning

confidence: 99%

See 2 more Smart Citations

One Bit Matters: Understanding Adversarial Examples as the Abuse of Redundancy

Wang,

Jia,

Friedland

et al. 2018

Preprint

View full text Add to dashboard Cite

Despite the great success achieved in machine learning (ML), adversarial examples have caused concerns with regards to its trustworthiness: A small perturbation of an input results in an arbitrary failure of an otherwise seemingly well-trained ML model. While studies are being conducted to discover the intrinsic properties of adversarial examples, such as their transferability and universality, there is insufficient theoretic analysis to help understand the phenomenon in a way that can influence the design process of ML experiments. In this paper, we deduce an information-theoretic model which explains adversarial attacks as the abuse of feature redundancies in ML algorithms. We prove that feature redundancy is a necessary condition for the existence of adversarial examples. Our model helps to explain some major questions raised in many anecdotal studies on adversarial examples. Our theory is backed up by empirical measurements of the information content of benign and adversarial examples on both image and text datasets. Our measurements show that typical adversarial examples introduce just enough redundancy to overflow the decision making of an ML model trained on corresponding benign examples. We conclude with actionable recommendations to improve the robustness of machine learners against adversarial examples.

show abstract

Section: Abuse Of Redundancymentioning

confidence: 98%

Section: Related Workmentioning

confidence: 99%

Section: Abuse Of Redundancymentioning

confidence: 99%

See 1 more Smart Citation

One Bit Matters: Understanding Adversarial Examples as the Abuse of Redundancy

Wang,

Jia,

Friedland

et al. 2018

Preprint

View full text Add to dashboard Cite

show abstract

“…Adversarial training (AT) (Szegedy et al, 2013;Goodfellow et al, 2014;Wang et al, 2016a) is a powerful regularization technique that has been primarily explored in the CV field to improve the robustness of models for input perturbations. In the NLP field, AT has been applied to various tasks by extending the concept of adversarial perturbations, e.g., text classification (Miyato et al, 2016;Sato et al, 2018), part-of-speech tagging (Yasunaga et al, 2018), relation extraction (Wu et al, 2017), and machine reading comprehension (Wang et al, 2016b).…”

Section: Adversarial Training and Its Extension Virtual Adversarial T...mentioning

confidence: 99%

Making Attention Mechanisms More Robust and Interpretable with Virtual Adversarial Training for Semi-Supervised Text Classification

Kitada¹,

Iyatomi²

2021

Preprint

View full text Add to dashboard Cite

We propose a new general training technique for attention mechanisms based on virtual adversarial training (VAT). VAT can compute adversarial perturbations from unlabeled data in a semi-supervised setting for the attention mechanisms that have been reported in previous studies to be vulnerable to perturbations. Empirical experiments reveal that our technique (1) provides significantly better prediction performance compared to not only conventional adversarial training-based techniques but also VAT-based techniques in a semi-supervised setting, (2) demonstrates a stronger correlation with the word importance and better agreement with evidence provided by humans, and (3) gains in performance with increasing amounts of unlabeled data.

show abstract

“…Szegedy et al [53] first showed that an input image perturbed with changes that are imperceptible to the human eye is capable of biasing convolutional neural networks (CNNs) to produce wrong labels with high confidence. Since then, numerous methods for generating adversarial examples [4,7,15,24,25,30,32,33,35,36,37,51] and defending against adversarial attacks [6,15,16,38,55,60] have been proposed. The important defence method of adversarial training [15,21,25,30] requires generating adversarial examples during training.…”

Section: Introductionmentioning

confidence: 99%

Physical Adversarial Attacks on an Aerial Imagery Object Detector

Du¹,

Chen²,

Chin³

et al. 2021

Preprint

View full text Add to dashboard Cite

Deep neural networks (DNNs) have become essential for processing the vast amounts of aerial imagery collected using earth-observing satellite platforms. However, DNNs are vulnerable towards adversarial examples, and it is expected that this weakness also plagues DNNs for aerial imagery. In this work, we demonstrate one of the first efforts on physical adversarial attacks on aerial imagery, whereby adversarial patches were optimised, fabricated and installed on or near target objects (cars) to significantly reduce the efficacy of an object detector applied on overhead images. Physical adversarial attacks on aerial images, particularly those captured from satellite platforms, are challenged by atmospheric factors (lighting, weather, seasons) and the distance between the observer and target. To investigate the effects of these challenges, we devised novel experiments and metrics to evaluate the efficacy of physical adversarial attacks against object detectors in aerial scenes. Our results indicate the palpable threat posed by physical adversarial attacks towards DNNs for processing satellite imagery 1 .

show abstract

A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Examples

Cited by 21 publications

References 44 publications

One Bit Matters: Understanding Adversarial Examples as the Abuse of Redundancy

One Bit Matters: Understanding Adversarial Examples as the Abuse of Redundancy

Making Attention Mechanisms More Robust and Interpretable with Virtual Adversarial Training for Semi-Supervised Text Classification

Physical Adversarial Attacks on an Aerial Imagery Object Detector

Contact Info

Product

Resources

About