Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Softw 2020
DOI: 10.1145/3368089.3409754
|View full text |Cite
|
Sign up to set email alerts
|

Is neuron coverage a meaningful measure for testing deep neural networks?

Abstract: Recent effort to test deep learning systems has produced an intuitive and compelling test criterion called neuron coverage (NC), which resembles the notion of traditional code coverage. NC measures the proportion of neurons activated in a neural network and it is implicitly assumed that increasing NC improves the quality of a test suite. In an attempt to automatically generate a test suite that increases NC, we design a novel diversity promoting regularizer that can be plugged into existing adversarial attack … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

8
84
2

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 124 publications
(100 citation statements)
references
References 35 publications
8
84
2
Order By: Relevance
“…Adversarial attacks often achieve higher robustness improvement than all three neuron coverage-guided fuzzing algorithms for simpler datasets such as MNIST, Fashion-MNIST and SVHN. This casts shadow on the usefulness of the test cases generated by neuron coverage-guided fuzzing algorithms in improving model robustness and is consistent with [6], [13], [23].…”
Section: Rq2: How Effective Is Our Fol Metric For Test Case Selection?supporting
confidence: 62%
See 3 more Smart Citations
“…Adversarial attacks often achieve higher robustness improvement than all three neuron coverage-guided fuzzing algorithms for simpler datasets such as MNIST, Fashion-MNIST and SVHN. This casts shadow on the usefulness of the test cases generated by neuron coverage-guided fuzzing algorithms in improving model robustness and is consistent with [6], [13], [23].…”
Section: Rq2: How Effective Is Our Fol Metric For Test Case Selection?supporting
confidence: 62%
“…Along with the testing metrics, many test case generation algorithms are also proposed including gradient-guided perturbation [30], [46], black-box [42] and metric-guided fuzzing [12], [21], [43]. However, these testing works lack rigorous evaluation on their usefulness in improving the model robustness (although most of them claim so) and have been shown to be ineffective in multiple recent works [6], [13], [23]. Multiple metrics have been proposed in the machine learning community to quantify the robustness of DL models as well [2], [40], [41], [44].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Due to the popularity of DL models and the critical importance of their reliability, a growing body of research efforts have been dedicated to testing DL models, with focus on adversarial attacks [14,21,32,[46][47][48] for model robustness, the discussion on various metrics for DL model testing [36,39,43,52,69], and testing DL models for specific applications [63,71,78]. Meanwhile, both running and testing DL models inevitably involve the underlying DL libraries, which serve as central pieces of infrastructures for building, training, optimizing and deploying DL models.…”
Section: Introductionmentioning
confidence: 99%