Adversarial Robustness on In- and Out-Distribution Improves Explainability

Augustin, Maximilian; Meinke, Alexander; Hein, Matthias

doi:10.1007/978-3-030-58574-7_14

Cited by 37 publications

(42 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This concept builds on [290]. In a related approach, Augustine et al [291] associate model explainability to its adversarial robustness, demonstrating generative properties of their adversarially robust model similar to [290]. Elliott et al [292] also attempts to bridge the gap between adversarial perturbations and counter-factual explanation of deep models.…”

Section: B the Link Between Attacks And Model Interpretationmentioning

confidence: 99%

Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

2018

View full text Add to dashboard Cite

Deep learning is at the heart of the current rise of artificial intelligence. In the field of Computer Vision, it has become the workhorse for applications ranging from self-driving cars to surveillance and security. Whereas deep neural networks have demonstrated phenomenal success (often beyond human capabilities) in solving complex problems, recent studies show that they are vulnerable to adversarial attacks in the form of subtle perturbations to inputs that lead a model to predict incorrect outputs. For images, such perturbations are often too small to be perceptible, yet they completely fool the deep learning models. Adversarial attacks pose a serious threat to the success of deep learning in practice. This fact has recently lead to a large influx of contributions in this direction. This article presents the first comprehensive survey on adversarial attacks on deep learning in Computer Vision. We review the works that design adversarial attacks, analyze the existence of such attacks and propose defenses against them. To emphasize that adversarial attacks are possible in practical conditions, we separately review the contributions that evaluate adversarial attacks in the real-world scenarios. Finally, drawing on the reviewed literature, we provide a broader outlook of this research direction.

show abstract

Section: B the Link Between Attacks And Model Interpretationmentioning

confidence: 99%

Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

2018

View full text Add to dashboard Cite

show abstract

“…Here, we focus on works where robustness to adversarial perturbation is a goal as well. Augustin et al [7] and Sehwag et al [8] consider the problem of combining robust classification and robust OOD detection, like we do. Sehwag et al investigated the robustness of multiple OOD detection methods and found that existing OOD detectors are not robust.…”

mentioning

confidence: 84%

“…In this section, we discuss two common approaches to train models for OOD detection. In the first approach, the goal is to make the model output a uniform distribution when an OOD input is presented [7], [10]. This is implemented using the cross-entropy loss function with the uniform distribution as the true distribution:…”

Section: Training Objectives In Related Workmentioning

confidence: 99%

“…Interestingly enough, that practice is not always followed in related work. For example, in both [7] and [10], the detection is implemented using s msp (see Section III-D), although Hendrycks et al mention in the Appendix that s uni might be more promising [10].…”

Section: Training Objectives In Related Workmentioning

confidence: 99%

“…making OOD detection even harder. Recently, adversarial training on both in-distribution and OOD samples was shown to be able to increase the robustness of OOD detection [7], [8]. However, the proposed algorithms are somewhat ad-hoc, as the underlying design-space for robust training, detection, and attack methods has not been explicitly formalized and explored.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Robust Classification Combined with Robust out-of-Distribution Detection: An Empirical Analysis

Megyeri

Hegedűs

Jelasity

2021

2021 International Joint Conference on Neural Networks (IJCNN)

View full text Add to dashboard Cite

Recently, out-of-distribution (OOD) detection has received considerable attention, because confident labels assigned to OOD examples represent a vulnerability similar to adversarial input perturbation. We are interested in models that combine the benefits of being robust to adversarial input and being able to detect OOD examples. Furthermore, we require that both in-distribution classification and OOD detection be robust to adversarial input perturbation. Several related studies apply an ad-hoc combination of several design choices to achieve similar goals. One can use several functions over the logit or soft-max layer for defining training objectives, OOD detection methods and adversarial attacks. Here, we present a design-space that covers such design choices, as well as a principled way of evaluating the networks. This includes a strong attack scenario where both in-distribution and OOD examples are adversarially perturbed to mislead OOD detection. We draw several interesting conclusions based on our empirical analysis of this design space. Most importantly, we argue that the key factor is not the OOD training or detection method in itself, but rather the application of matching detection and training methods. I. INTRODUCTIONAlthough computer vision models have achieved remarkable performance on various recognition tasks in recent years, they are susceptible to adversarial input [1]-[3], where invisibly small but well designed input perturbations mislead stateof-the-art models. The sensitivity of the current models to adversarial input indicates that these models are not aligned well with human perception. Among the many defenses against input perturbation, adversarial training has been found to be the most effective [4], [5]. In a nutshell, adversarial training means that the model is trained over the adversarially perturbed version of the training data to improve the robustness of the model.Recently, robust out-of-distribution (OOD) detection also received considerable attention [6]-[8]. Adversarially trained models are relatively robust to adversarial input but they might assign high confidence to OOD samples. In a real-world application, this also represents a serious vulnerability [8]. Besides, OOD input is also open to adversarial perturbation,

show abstract