Evaluating and Understanding the Robustness of Adversarial Logit Pairing

Engstrom, Logan; Ilyas, Andrew; Athalye, Anish

doi:10.48550/arxiv.1807.10272

Cited by 40 publications

(62 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For improving adversarial robustness, several studies focus on the flatness or smoothness of the loss landscape in the input data space because adversarial examples are perturbations in the input space [17,19,4,7]. Qin et al [17] have presented a regularization method to flatten the loss landscape in terms of data points.…”

Section: Adversarial Robustnessmentioning

confidence: 99%

Smoothness Analysis of Adversarial Training

Kanai¹,

Yamada²,

Takahashi³

et al. 2021

Preprint

View full text Add to dashboard Cite

Deep neural networks are vulnerable to adversarial attacks. Recent studies of adversarial robustness focus on the loss landscape in the parameter space since it is related to optimization performance. These studies conclude that it is hard to optimize the loss function for adversarial training with respect to parameters because the loss function is not smooth: i.e., its gradient is not Lipschitz continuous. However, this analysis ignores the dependence of adversarial attacks on parameters. Since adversarial attacks are the worst noise for the models, they should depend on the parameters of the models. In this study, we analyze the smoothness of the loss function of adversarial training for binary linear classification considering the dependence. We reveal that the Lipschitz continuity depends on the types of constraints of adversarial attacks in this case. Specifically, under the L2 constraints, the adversarial loss is smooth except at zero.

show abstract

Section: Adversarial Robustnessmentioning

confidence: 99%

Smoothness Analysis of Adversarial Training

Kanai¹,

Yamada²,

Takahashi³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Adversarial robustness. To mitigate threats posed by adversarial examples [29,38], adversarial training [14,25,36,43,11,50,47] solves a min-max optimization problem in which adversarial examples are crafted to maximize the training loss, and these examples are then used to update network parameters during loss minimization. This process can be interpreted as approximately solving the saddlepoint optimization problem:…”

Section: Background and Related Workmentioning

confidence: 99%

THAT: Two Head Adversarial Training for Improving Robustness at Scale

Wu,

Goldstein,

Davis

et al. 2021

Preprint

View full text Add to dashboard Cite

Many variants of adversarial training have been proposed, with most research focusing on problems with relatively few classes. In this paper, we propose Two Head Adversarial Training (THAT), a two-stream adversarial learning network that is designed to handle the large-scale manyclass ImageNet dataset. The proposed method trains a network with two heads and two loss functions; one to minimize feature-space domain shift between natural and adversarial images, and one to promote high classification accuracy. This combination delivers a hardened network that achieves state of the art robust accuracy while maintaining high natural accuracy on ImageNet. Through extensive experiments, we demonstrate that the proposed framework outperforms alternative methods under both standard and "free" adversarial training settings.

show abstract

“…But the adoption of these models in safety-critical or high-security applications is inhibited due to two major concerns: their brittleness to adversarial attack methods that can make imperceptible modification to inputs and trigger wrong decisions (Szegedy et al, 2013;Papernot et al, 2016a), and the lack of interpretability (Gunning, 2017). Significant progress has also been made towards adversarial robustness (Papernot et al, 2016b;Madry et al, 2017;Engstrom et al, 2018) and explainability (Li & Yu, 2015;Yi et al, 2016;Sundararajan et al, 2017), and a few recent theoretical studies (Kilbertus et al, 2018;Chalasani et al, 2018) indicate a strong connection between these two issues.…”

Section: Introductionmentioning

confidence: 99%

“…• We propose a defense layer based on Kahneman's decomposition of cognition into intuitive System 1 and deliberative System 2. This approach does not rely on analyzing training data such as manifold-based defense (Ilyas et al, 2017;Jha et al, 2018), or statistical signature of the machine learning models such as logit pairing (Engstrom et al, 2018), or methods that exploit the knowledge of specific attack for adversarial training (Tramèr et al, 2017), or robust optimization with L p norm bounds (Madry et al, 2017;Raghunathan et al, 2018).…”

Section: Introductionmentioning

confidence: 99%

Attribution-driven Causal Analysis for Detection of Adversarial Examples

Jha,

Raj,

Fernandes

et al. 2019

Preprint

View full text Add to dashboard Cite

Attribution methods have been developed to explain the decision of a machine learning model on a given input. We use the Integrated Gradient method for finding attributions to define the causal neighborhood of an input by incrementally masking high attribution features. We study the robustness of machine learning models on benign and adversarial inputs in this neighborhood. Our study indicates that benign inputs are robust to the masking of high attribution features but adversarial inputs generated by the state-of-the-art adversarial attack methods such as DeepFool, FGSM, CW and PGD, are not robust to such masking. Further, our study demonstrates that this concentration of high-attribution features responsible for the incorrect decision is more pronounced in physically realizable adversarial examples. This difference in attribution of benign and adversarial inputs can be used to detect adversarial examples. Such a defense approach is independent of training data and attack method, and we demonstrate its effectiveness on digital and physically realizable perturbations.

show abstract

Evaluating and Understanding the Robustness of Adversarial Logit Pairing

Cited by 40 publications

References 5 publications

Smoothness Analysis of Adversarial Training

Smoothness Analysis of Adversarial Training

THAT: Two Head Adversarial Training for Improving Robustness at Scale

Attribution-driven Causal Analysis for Detection of Adversarial Examples

Contact Info

Product

Resources

About