Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis 2021
DOI: 10.1145/3460319.3464811
|View full text |Cite
|
Sign up to set email alerts
|

DeepHyperion: exploring the feature space of deep learning-based systems through illumination search

Abstract: Deep Learning (DL) has been successfully applied to a wide range of application domains, including safetycritical ones. Several DL testing approaches have been recently proposed in the literature but none of them aims to assess how different interpretable features of the generated inputs affect the system's behaviour. In this paper, we resort to Illumination Search to find the highest-performing test cases (i.e., misbehaving and closest to misbehaving), spread across the cells of a map representing the feature… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
22
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 42 publications
(22 citation statements)
references
References 32 publications
0
22
0
Order By: Relevance
“…We suspected that accounting for numerous redundant test inputs affected our correlation analysis. In practice, selecting or generating test inputs that trigger failures (i.e mispredictions) is far more useful when these failures are diverse [60]. A test set that repeatedly exposes the same problem in the DNN model is a waste of computational resources, especially when we have a limited testing budget and a high labeling cost for testing data [60].…”
Section: Evaluation and Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We suspected that accounting for numerous redundant test inputs affected our correlation analysis. In practice, selecting or generating test inputs that trigger failures (i.e mispredictions) is far more useful when these failures are diverse [60]. A test set that repeatedly exposes the same problem in the DNN model is a waste of computational resources, especially when we have a limited testing budget and a high labeling cost for testing data [60].…”
Section: Evaluation and Resultsmentioning
confidence: 99%
“…In practice, selecting or generating test inputs that trigger failures (i.e mispredictions) is far more useful when these failures are diverse [60]. A test set that repeatedly exposes the same problem in the DNN model is a waste of computational resources, especially when we have a limited testing budget and a high labeling cost for testing data [60]. This is why, similar to other studies comparing the effectiveness of test strategies with regular software, we want here to address the notion of faults detected in DNNs and study their association with diversity and coverage.…”
Section: Evaluation and Resultsmentioning
confidence: 99%
“…Some DNN testing approaches can provide explanations for portions of the input space in which DNN errors are observed [1,19,56]. For instance, Abdessalem et al [1] rely on evolutionary algorithms to search for test inputs using simulators and, to maximize test effectiveness, decision trees are used during the search process to learn the regions of the input space that are likely unsafe and, hence, should be targeted by testing.…”
Section: Related Workmentioning
confidence: 99%
“…Further, recent work studies the effectiveness of decision trees in characterizing the input space of the simulator-based testing process [19]. Finally, DeepHyperion [56] configures a generative model using a metaheuristic search algorithm directed towards generating test inputs in a specific dimension of the inputs space and provides a set of feature maps which visualize the degree of accuracy obtained for different values of dimensions pairs. Different from SEDE, these DNN testing approaches can only be used to characterize simulated scenarios not real-world ones.…”
Section: Related Workmentioning
confidence: 99%
“…Riccio and Tonella 20 propose DeepJanus, a model‐based test generator that uses Catmull–Rom splines to characterize the road shape and generate inputs that are at the behavioral frontier of a self‐driving car model. Zohdinasab et al 57 use illumination search to cover the map of external behaviors of a self‐driving vehicle. Riccio et al 58 augment existing test suites by mutation adequacy‐guided test generation.…”
Section: Related Workmentioning
confidence: 99%