DeepHyperion: exploring the feature space of deep learning-based systems through illumination search

Zohdinasab, Tahereh; Riccio, Vincenzo; Gambi, Alessio; Tonella, Paolo

doi:10.1145/3460319.3464811

Cited by 42 publications

(22 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We suspected that accounting for numerous redundant test inputs affected our correlation analysis. In practice, selecting or generating test inputs that trigger failures (i.e mispredictions) is far more useful when these failures are diverse [60]. A test set that repeatedly exposes the same problem in the DNN model is a waste of computational resources, especially when we have a limited testing budget and a high labeling cost for testing data [60].…”

Section: Evaluation and Resultsmentioning

confidence: 99%

“…In practice, selecting or generating test inputs that trigger failures (i.e mispredictions) is far more useful when these failures are diverse [60]. A test set that repeatedly exposes the same problem in the DNN model is a waste of computational resources, especially when we have a limited testing budget and a high labeling cost for testing data [60]. This is why, similar to other studies comparing the effectiveness of test strategies with regular software, we want here to address the notion of faults detected in DNNs and study their association with diversity and coverage.…”

Section: Evaluation and Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Black-Box Testing of Deep Neural Networks through Test Case Diversity

Aghababaeyan

Abdellatif

Briand

et al. 2023

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

Deep Neural Networks (DNNs) have been extensively used in many areas including image processing, medical diagnostics and autonomous driving. However, DNNs can exhibit erroneous behaviours that may lead to critical errors, especially when used in safety-critical systems. Inspired by testing techniques for traditional software systems, researchers have proposed neuron coverage criteria, as an analogy to source code coverage, to guide the testing of DNNs. Despite very active research on DNN coverage, several recent studies have questioned the usefulness of such criteria in guiding DNN testing. Further, from a practical standpoint, these criteria are white-box as they require access to the internals or training data of DNNs, which is often not feasible or convenient. Measuring such coverage requires executing DNNs with candidate inputs to guide testing, which is not an option in many practical contexts. In this paper, we investigate diversity metrics as an alternative to white-box coverage criteria. For the previously mentioned reasons, we require such metrics to be black-box and not rely on the execution and outputs of DNNs under test. To this end, we first select and adapt three diversity metrics and study, in a controlled manner, their capacity to measure actual diversity in input sets. We then analyze their statistical association with fault detection using four datasets and five DNNs. We further compare diversity with state-of-the-art white-box coverage criteria. As a mechanism to enable such analysis, we also propose a novel way to estimate fault detection in DNNs.Our experiments show that relying on the diversity of image features embedded in test input sets is a more reliable indicator than coverage criteria to effectively guide DNN testing. Indeed, we found that one of our selected black-box diversity metrics far outperforms existing coverage criteria in terms of fault-revealing capability and computational time. Results also confirm the suspicions that state-of-the-art coverage criteria are not adequate to guide the construction of test input sets to detect as many faults as possible using natural inputs.

show abstract

Section: Evaluation and Resultsmentioning

confidence: 99%

Section: Evaluation and Resultsmentioning

confidence: 99%

Black-Box Testing of Deep Neural Networks through Test Case Diversity

Aghababaeyan

Abdellatif

Briand

et al. 2023

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

show abstract

“…Some DNN testing approaches can provide explanations for portions of the input space in which DNN errors are observed [1,19,56]. For instance, Abdessalem et al [1] rely on evolutionary algorithms to search for test inputs using simulators and, to maximize test effectiveness, decision trees are used during the search process to learn the regions of the input space that are likely unsafe and, hence, should be targeted by testing.…”

Section: Related Workmentioning

confidence: 99%

“…Further, recent work studies the effectiveness of decision trees in characterizing the input space of the simulator-based testing process [19]. Finally, DeepHyperion [56] configures a generative model using a metaheuristic search algorithm directed towards generating test inputs in a specific dimension of the inputs space and provides a set of feature maps which visualize the degree of accuracy obtained for different values of dimensions pairs. Different from SEDE, these DNN testing approaches can only be used to characterize simulated scenarios not real-world ones.…”

Section: Related Workmentioning

confidence: 99%

Simulator-based explanation and debugging of hazard-triggering events in DNN-based safety-critical systems

Fahmy¹,

Pastore²,

Briand³

2022

Preprint

View full text Add to dashboard Cite

When Deep Neural Networks (DNNs) are used in safety-critical systems, engineers should determine the safety risks associated with DNN errors observed during testing. For DNNs processing images, engineers visually inspect all error-inducing images to determine common characteristics among them. Such characteristics correspond to hazard-triggering events (e.g., low illumination) that are essential inputs for safety analysis. Though informative, such activity is expensive and error-prone.To support such safety analysis practices, we propose SEDE, a technique that generates readable descriptions for commonalities in error-inducing, real-world images and improves the DNN through effective retraining. SEDE leverages the availability of simulators, which are commonly used for cyber-physical systems. SEDE relies on genetic algorithms to drive simulators towards the generation of images that are similar to errorinducing, real-world images in the test set; it then leverages rule learning algorithms to derive expressions that capture commonalities in terms of simulator parameter values. The derived expressions are then used to generate additional images to retrain and improve the DNN.With DNNs performing in-car sensing tasks, SEDE successfully characterized hazard-triggering events leading to a DNN accuracy drop. Also, SEDE enabled retraining to achieve significant improvements in DNN accuracy, up to 18 percentage points. CCS Concepts: • Software and its engineering → Software maintenance tools; Search-based software engineering; Software testing and debugging; • Computing methodologies → Machine learning.

show abstract

“…Riccio and Tonella 20 propose DeepJanus, a model‐based test generator that uses Catmull–Rom splines to characterize the road shape and generate inputs that are at the behavioral frontier of a self‐driving car model. Zohdinasab et al 57 use illumination search to cover the map of external behaviors of a self‐driving vehicle. Riccio et al 58 augment existing test suites by mutation adequacy‐guided test generation.…”

Section: Related Workmentioning

confidence: 99%

Confidence‐driven weighted retraining for predicting safety‐critical failures in autonomous driving systems

Stocco

Tonella

2021

J Software Evolu Process

Self Cite

View full text Add to dashboard Cite

Safe handling of hazardous driving situations is a task of high practical relevance for building reliable and trustworthy cyber‐physical systems such as autonomous driving systems. This task necessitates an accurate prediction system of the vehicle's confidence to prevent potentially harmful system failures on the occurrence of unpredictable conditions that make it less safe to drive. In this paper, we discuss the challenges of adapting a misbehavior predictor with knowledge mined during the execution of the main system. Then, we present a framework for the continual learning of misbehavior predictors, which records in‐field behavioral data to determine what data are appropriate for adaptation. Our framework guides adaptive retraining using a novel combination of in‐field confidence metric selection and reconstruction error‐based weighing. We evaluate our framework to improve a misbehavior predictor from the literature on the Udacity simulator for self‐driving cars. Our results show that our framework can reduce the false positive rate by a large margin and can adapt to nominal behavior drifts while maintaining the original capability to predict failures up to several seconds in advance.

show abstract

DeepHyperion: exploring the feature space of deep learning-based systems through illumination search

Cited by 42 publications

References 32 publications

Black-Box Testing of Deep Neural Networks through Test Case Diversity

Black-Box Testing of Deep Neural Networks through Test Case Diversity

Simulator-based explanation and debugging of hazard-triggering events in DNN-based safety-critical systems

Confidence‐driven weighted retraining for predicting safety‐critical failures in autonomous driving systems

Contact Info

Product

Resources

About