Exposing previously undetectable faults in deep neural networks

Dunn, Isaac; Pouget, Hadrien; Kroening, Daniel; Melham, Tom

doi:10.1145/3460319.3464801

Cited by 14 publications

(10 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, when testing Deep Neural Networks (DNNs) that recognise handwritten digits in greyscale images, two dimensions of interest may be the boldness and discontinuity of the handwriting stroke (see section 2). In this case, DEEPHYPERION-CS uses the misclassification distance as fitness function [12,59,18] to generate greyscale images containing digits written using strokes with different boldness and discontinuity (see Figure 1). The misclassification distance is computed as the difference between the activation value of the neuron associated with the correct label and the highest incorrect activation from the DNN's softmax layer output (hence, it becomes negative as a misclassification occurs).…”

Section: Guiding Illumination Search With Contribution Scorementioning

confidence: 99%

“…However, they are not focused on generating inputs with different structural features and, thus, covering the feature map. Another alternative for input generation are generative ML approaches that approximate the input distribution, such as Variational Auto-Encoders (VAEs) [37] and Generative Adversarial Networks (GANs) [18]. VAEs and GANs are very useful when a model of the inputs is not available, e.g., real-world images from ImageNet.…”

Section: Deephyperion-cs Is a Model-based Test Input Generation Techn...mentioning

confidence: 99%

See 1 more Smart Citation

Efficient and Effective Feature Space Exploration for Testing Deep Learning Systems

Zohdinasab

Riccio

Gambi

et al. 2023

ACM Trans. Softw. Eng. Methodol.

View full text Add to dashboard Cite

Assessing the quality of Deep Learning (DL) systems is crucial, as they are increasingly adopted in safety-critical domains. Researchers have proposed several input generation techniques for DL systems. While such techniques can expose failures, they do not explain which features of the test inputs influenced the system’s (mis-) behaviour. DeepHyperion was the first test generator to overcome this limitation by exploring the DL systems’ feature space at large. In this paper, we propose DeepHyperion-CS , a test generator for DL systems which enhances DeepHyperion by promoting the inputs that contributed more to feature space exploration during the previous search iterations. We performed an empirical study involving two different test subjects (i.e., a digit classifier and a lane-keeping system for self-driving cars). Our results proved that the contribution-based guidance implemented within DeepHyperion-CS outperforms state-of-the-art tools and significantly improves the efficiency and the effectiveness of DeepHyperion . DeepHyperion-CS exposed significantly more misbehaviours for 5 out of 6 feature combinations and was up to 65% more efficient than DeepHyperion in finding misbehaviour-inducing inputs and exploring the feature space. DeepHyperion-CS was useful for expanding the datasets used to train the DL systems, populating up to 200% more feature map cells than the original training set.

show abstract

Section: Guiding Illumination Search With Contribution Scorementioning

confidence: 99%

Section: Deephyperion-cs Is a Model-based Test Input Generation Techn...mentioning

confidence: 99%

Efficient and Effective Feature Space Exploration for Testing Deep Learning Systems

Zohdinasab

Riccio

Gambi

et al. 2023

ACM Trans. Softw. Eng. Methodol.

View full text Add to dashboard Cite

show abstract

“…Among the seven possibilities tried, the authors found o to be more effective in the formulation present in Equation 2. 16, where Z(x) represents the output of the target classifier f except the last softmax layer and j and j are the indices of y and y target , respectively. By incorporating the constraint in the objective function, the optimization problem becomes minimizing Equation 2.17, where c is a suitable positive constant and D one of the L 0 , L 2 , and L ∞ norms.…”

Section: Adversarial Attacksmentioning

confidence: 99%

GASTeN: Generative Adversarial Stress Test Networks

Cunha

Soares

Restivo

et al. 2023

Advances in Intelligent Data Analysis XXI

View full text Add to dashboard Cite

Machine learning (ML) and deep learning (DL) are now ubiquitous in our society, and techniques that enable responsible usage are fundamental to safeguard people from being negatively affected. One particular example of DL's success is image classification. However, DL techniques function as black-box models whose knowledge representation is difficult to comprehend, and understanding the conditions under which they behave correctly is hard. Another example of a DL application is data generation, for which an algorithm known as GANs, mainly used for data augmentation, has achieved remarkable success. In GANs, two networks -the generator and the discriminator -are simultaneously trained. The generator learns to produce realistic data by trying to fool the discriminator, which is trained to distinguish between real and fake samples.This dissertation proposes a GAN-based approach for synthesizing new data to help understand DL image classifiers. We aim to generate examples that are hard for a given classifier that we could, ultimately, systematically analyze to get information about cases where the model's performance degrades. For that, we opt to generate data classified with low confidence by a classifier. Our approach, dubbed GASTeN, consists of modifying the loss function of the generator to include a new objective, dubbed confusion distance, which reflects how far the generated images are from having the desired output by the targetted classifier, i.e., the one we wish to evaluate. It introduces two hyperparameters, a weight to factor the new loss term, and the duration of pre-training of the GAN without any modifications.We empirically evaluate our proposal by instantiating it with a DCGAN architecture and a confusion distance suitable for binary classification. In our experiments, we target classifiers of binary subsets of the MNIST and Fashion MNIST datasets. We explore several hyperparameter configurations and target classifiers with different performances, analyzing the algorithm's behavior by collecting quantitative metrics for the two optimization objectives -FID for image realness and the average value of the confusion distance for the goal of confusing the classifier. In our experiments, we show scenarios in which we can obtain a generator with the desired properties of generating data with high realisticness that is mainly classified with low confidence by the target classifiers, along with scenarios where our goal is not attained. We conclude that, while challenging to optimize for both objectives simultaneously, it is possible to achieve images with the desired properties, albeit at the cost of carefully chosen hyperparameters.

show abstract

“…In the DNN test input generators (TIG) literature [2], [14], [19], [20], [30], [31], with just one notable preprint as an exception [18], we are not aware of any paper aiming to generate true ambiguity directly, while most TIG aim for other objectives. Some works [19], [20], [32] propose to corrupt nominal input in predefined, natural and labelpreserving ways to generate OOD test data.…”

Section: Related Workmentioning

confidence: 99%

“…Autoencoders (AEs) are a powerful tool, used in a range of TIG [18], [27], [28], [31]. AEs follow an encoder-decoder architecture as shown in the blue part of Figure 2: An encoder E compresses an input into a smaller latent space (LS), and the decoder D then attempts to reconstruct x from the LS.…”

Section: A Interpolation In Autoencodersmentioning

confidence: 99%

A Forgotten Danger in DNN Supervision Testing: Generating and Detecting True Ambiguity

Weiss¹,

Gómez²,

Tonella³

2022

Preprint

View full text Add to dashboard Cite

Deep Neural Networks (DNNs) are becoming a crucial component of modern software systems, but they are prone to fail under conditions that are different from the ones observed during training (out-of-distribution inputs) or on inputs that are truly ambiguous, i.e., inputs that admit multiple classes with nonzero probability in their ground truth labels. Recent work proposed DNN supervisors to detect highuncertainty inputs before their possible misclassification leads to any harm. To test and compare the capabilities of DNN supervisors, researchers proposed test generation techniques, to focus the testing effort on high-uncertainty inputs that should be recognized as anomalous by supervisors. However, existing test generators can only produce out-of-distribution inputs. No existing model-and supervisor independent technique supports the generation of truly ambiguous test inputs.In this paper, we propose a novel way to generate ambiguous inputs to test DNN supervisors and used it to empirically compare several existing supervisor techniques. In particular, we propose AMBIGUESS to generate ambiguous samples for image classification problems. AMBIGUESS is based on gradientguided sampling in the latent space of a regularized adversarial autoencoder. Moreover, we conducted what is -to the best of our knowledge -the most extensive comparative study of DNN supervisors, considering their capabilities to detect 4 distinct types of high-uncertainty inputs, including truly ambiguous ones.

show abstract

Exposing previously undetectable faults in deep neural networks

Cited by 14 publications

References 32 publications

Efficient and Effective Feature Space Exploration for Testing Deep Learning Systems

Efficient and Effective Feature Space Exploration for Testing Deep Learning Systems

GASTeN: Generative Adversarial Stress Test Networks

A Forgotten Danger in DNN Supervision Testing: Generating and Detecting True Ambiguity

Contact Info

Product

Resources

About