The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2019
DOI: 10.48550/arxiv.1905.09186
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Detecting Adversarial Examples and Other Misclassifications in Neural Networks by Introspection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 0 publications
0
7
0
Order By: Relevance
“…For instance, Wang et al (2021) trains a recurrent neural network that captures the difference in the logits distribution of manipulated samples. Aigrain and Detyniecki (2019), instead, achieves good detection performance by feeding a simple three-layer neural network directly with the logit activations.…”
Section: Logits-based Adversarial Detectorsmentioning
confidence: 99%
See 1 more Smart Citation
“…For instance, Wang et al (2021) trains a recurrent neural network that captures the difference in the logits distribution of manipulated samples. Aigrain and Detyniecki (2019), instead, achieves good detection performance by feeding a simple three-layer neural network directly with the logit activations.…”
Section: Logits-based Adversarial Detectorsmentioning
confidence: 99%
“…Previous research showed that analyzing the model's logits leads to promising results in discriminating manipulated inputs (Wang et al, 2021;Aigrain and Detyniecki, 2019;Hendrycks and Gimpel, 2016). However, logits-based adversarial detectors have been only studied on computer vision applications.…”
Section: Introductionmentioning
confidence: 99%
“…In supervised detection, the defender considers AEs generated by one or more adversarial attack algorithms in designing and training the detector D. It is believed that AEs have distinguishable features that make them different from clean inputs [26], hence, defenders take this advantage to build a robust detector D. To accomplish this, many approaches have been presented in the literature. ), Circumventable [25] Softmax [80] BIM, DF M( ) Softmax [87] FGSM, BIM, DF M( ), C( ) Softmax [88] FGSM, BIM, JSMA, DF M( ), C( )…”
Section: Supervised Detectionmentioning
confidence: 99%
“…The detector D considers an input as AE if there is no match between baseline classifier and the retrained classifier. Aigrain et al [87] built a simple NN detector D which takes the baseline model logits of clean and AEs as inputs to build a binary classifier. Finally, following the hypothesis that different models make different mistakes when presented with the same attack inputs, Monteiro et al [88] proposed a bimodel mismatch detection.…”
Section: Auxiliary Model Approachmentioning
confidence: 99%
“…These two family of approaches have their own limitations as the former one is more computationally expensive while the latter provides weaker defense, albeit at a lower computational overhead. Instead of making the model robust, there are also approaches to detect these attacks [26,16,48,3,47,29,43,23]. These methods often require retraining of the network [16,3,23].…”
Section: Introductionmentioning
confidence: 99%