2016
DOI: 10.48550/arxiv.1612.00334
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Examples

Beilun Wang,
Ji Gao,
Yanjun Qi

Abstract: Most machine learning classifiers, including deep neural networks, are vulnerable to adversarial examples. Such inputs are typically generated by adding small but purposeful modifications that lead to incorrect outputs while imperceptible to human eyes. The goal of this paper is not to introduce a single method, but to make theoretical steps towards fully understanding adversarial examples. By using concepts from topology, our theoretical analysis brings forth the key reasons why an adversarial example can foo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
34
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 21 publications
(34 citation statements)
references
References 44 publications
0
34
0
Order By: Relevance
“…In the following we will present a proof based on the model presented in Section 3.2 and the currently accepted definition of adversarial examples (Wang et al, 2016) that shows that feature redundancy is indeed a necessary condition for adversarial examples. Throughout, we assume that a learning model can be expressed as f (•) = g(T (•)), where T (•) represents the feature extraction function and g(•) is a simple decision making function, e.g., logistic regression, using the extracted features as the input.…”
Section: Abuse Of Redundancymentioning
confidence: 98%
See 2 more Smart Citations
“…In the following we will present a proof based on the model presented in Section 3.2 and the currently accepted definition of adversarial examples (Wang et al, 2016) that shows that feature redundancy is indeed a necessary condition for adversarial examples. Throughout, we assume that a learning model can be expressed as f (•) = g(T (•)), where T (•) represents the feature extraction function and g(•) is a simple decision making function, e.g., logistic regression, using the extracted features as the input.…”
Section: Abuse Of Redundancymentioning
confidence: 98%
“…Later, boundary-based analysis has been derived to show that adversarial examples try to cross the decision boundaries (He et al, 2018). More studies regarding to data manifold have also been leveraged to better understand these perturbations (Ma et al, 2018;Gilmer et al, 2018;Wang et al, 2016). While these works provide hints to obtain a more fundamental understanding, to the best of our knowledge, no study was able to create a model that results in actionable recommendations to improve the robustness of machine learners against adversarial attacks.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Adversarial training (AT) (Szegedy et al, 2013;Goodfellow et al, 2014;Wang et al, 2016a) is a powerful regularization technique that has been primarily explored in the CV field to improve the robustness of models for input perturbations. In the NLP field, AT has been applied to various tasks by extending the concept of adversarial perturbations, e.g., text classification (Miyato et al, 2016;Sato et al, 2018), part-of-speech tagging (Yasunaga et al, 2018), relation extraction (Wu et al, 2017), and machine reading comprehension (Wang et al, 2016b).…”
Section: Adversarial Training and Its Extension Virtual Adversarial T...mentioning
confidence: 99%
“…Szegedy et al [53] first showed that an input image perturbed with changes that are imperceptible to the human eye is capable of biasing convolutional neural networks (CNNs) to produce wrong labels with high confidence. Since then, numerous methods for generating adversarial examples [4,7,15,24,25,30,32,33,35,36,37,51] and defending against adversarial attacks [6,15,16,38,55,60] have been proposed. The important defence method of adversarial training [15,21,25,30] requires generating adversarial examples during training.…”
Section: Introductionmentioning
confidence: 99%