DeepGini: prioritizing massive tests to enhance the robustness of deep neural networks

Feng, Yang; Shi, Qiyun; Gao, Xinyu; Wan, Jun; Fang, Chunrong; Chen, Zhenyu

doi:10.1145/3395363.3397357

Cited by 133 publications

(145 citation statements)

References 31 publications

Supporting

Mentioning

144

Contrasting

Order By: Relevance

“…So far this problem remains an open challenge. Prior work like DeepGini has proposed to calculate a Gini index of a test case from the model's output probability distribution [7]. DeepGini's intuition is to favor those test cases with most uncertainty (e.g., a more flat distribution) under the current model's prediction.…”

Section: Fol Guided Test Case Selectionmentioning

confidence: 99%

See 1 more Smart Citation

RobOT: Robustness-Oriented Testing for Deep Learning Systems

Wang

Chen

Sun

et al. 2021

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

View full text Add to dashboard Cite

Recently, there has been a significant growth of interest in applying software engineering techniques for the quality assurance of deep learning (DL) systems. One popular direction is deep learning testing, where adversarial examples (a.k.a. bugs) of DL systems are found either by fuzzing or guided search with the help of certain testing metrics. However, recent studies have revealed that the commonly used neuron coverage metrics by existing DL testing approaches are not correlated to model robustness. It is also not an effective measurement on the confidence of the model robustness after testing. In this work, we address this gap by proposing a novel testing framework called Robustness-Oriented T esting (RobOT). A key part of RobOT is a quantitative measurement on 1) the value of each test case in improving model robustness (often via retraining), and 2) the convergence quality of the model robustness improvement. RobOT utilizes the proposed metric to automatically generate test cases valuable for improving model robustness. The proposed metric is also a strong indicator on how well robustness improvement has converged through testing. Experiments on multiple benchmark datasets confirm the effectiveness and efficiency of RobOT in improving DL model robustness, with 67.02% increase on the adversarial robustness that is 50.65% higher than the state-of-the-art work DeepGini.

show abstract

Section: Fol Guided Test Case Selectionmentioning

confidence: 99%

“…We adopt the most recent work DeepGini[7] as the baseline of the test case selection strategy. DeepGini calculates a Gini index for each test case according to the output probability distribution of the model.…”

mentioning

confidence: 99%

RobOT: Robustness-Oriented Testing for Deep Learning Systems

Wang

Chen

Sun

et al. 2021

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

View full text Add to dashboard Cite

show abstract

“…Kim et al [17] proposed surprise guided testing metrics based on the similarity between the training data and test data. Moreover, some prediction probability based data selection metrics [18], [19], [70] are also proposed. Most recently, Wang et al [71] proposed a robustness-oriented data selection metric, however, their metric can only select data that are generated by adversarial attacks, it is out of our consideration.…”

Section: Related Workmentioning

confidence: 99%

“…On the other hand, recent work on DL testing and debugging [12]- [19] have proposed different metrics for test generation and test selection, i.e., the problem of selecting test data that are more likely to be misclassified by the model [20]. As in active learning scenarios, these test data can then be used to improve the model (by retraining).…”

Section: Introductionmentioning

confidence: 99%

“…The proliferation of data selection metrics (coming from active learning and testing) makes it challenging for engineers to decide which one they should use. Indeed, the different metrics have been evaluated on a restricted set of problems (mostly image classification datasets) under incomparable experimental settings (different models and labeling budget) [18], [19]. There is, therefore, a need for a comprehensive study of all these data selection metrics on a common ground involving different classification tasks.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Towards Exploring the Limitations of Active Learning: An Empirical Study

Guo

Cordy

et al. 2021

2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE)

View full text Add to dashboard Cite

Deep neural networks (DNNs) are increasingly deployed as integral parts of software systems. However, due to the complex interconnections among hidden layers and massive hyperparameters, DNNs must be trained using a large number of labeled inputs, which calls for extensive human effort for collecting and labeling data. Spontaneously, to alleviate this growing demand, multiple state-of-the-art studies have developed different metrics to select a small yet informative dataset for the model training. These research works have demonstrated that DNN models can achieve competitive performance using a carefully selected small set of data. However, the literature lacks proper investigation of the limitations of data selection metrics, which is crucial to apply them in practice. In this paper, we fill this gap and conduct an extensive empirical study to explore the limits of data selection metrics. Our study involves 15 data selection metrics evaluated over 5 datasets (2 image classification tasks and 3 text classification tasks), 10 DNN architectures, and 20 labeling budgets (ratio of training data being labeled). Our findings reveal that, while data selection metrics are usually effective in producing accurate models, they may induce a loss of model robustness (against adversarial examples) and resilience to compression. Overall, we demonstrate the existence of a trade-off between labeling effort and different model qualities. This paves the way for future research in devising data selection metrics considering multiple quality criteria.

show abstract

Ex ² : Monte Carlo Tree Search‐based test inputs prioritization for fuzzing deep neural networks

Wang

Zhao

et al. 2022

Int J of Intelligent Sys

View full text Add to dashboard Cite

Fuzzing is considered to be an essential approach to guarantee the reliability of deep neural networks (DNNs) based systems. The DNN fuzzing leverages various inputs prioritization methods to guide the testing process. The current research mainly focus on constructing testing metrics that symbolize the logical representation of the DNN to guide the generation of test cases, which neglects the potential performance brought by implementing heuristic algorithm. Moreover, the straightforward implementation of queue structure can not represent the metamorphic relationships between generated inputs in DNN fuzzing. Therefore, developing the appropriate heuristic algorithm‐based inputs prioritization method is critical to improve the performance of DNN fuzzers. In this paper, we propose a Monte Carlo Tree Search (MCTS) based inputs prioritization method called E x 2 $E{x}^{2}$ (Exploration and Exploitation) that formulates DNN testing exploration as the sequential decision process. The technique introduces an innovative tree‐structure design that schedules inputs from the statistical perspective. Different from traditional DNN testing, the batch pool is maintained in the form of nodes in MCTS. The links between nodes precisely represent the metamorphic relationship between input batches, which indicates the potential value for in‐depth search. Furthermore, a novel simulation mechanism is implemented to adapt MCTS in DNN testing, which attain better coverage feedback. The effectiveness of our method is comprehensively investigated on six popular deep learning models from LeNet and VGG families. The comparison experiments are conducted between DeepHunter, TensorFuzz, and DeepSmartFuzzer to demonstrate efficacy on various testing metrics. The experimental results show that the E x 2 $E{x}^{2}$ significantly enhance the coverage gain of DNN fuzzing up to 30% against the best performance in comparison groups.

show abstract

DeepGini: prioritizing massive tests to enhance the robustness of deep neural networks

Cited by 133 publications

References 31 publications

RobOT: Robustness-Oriented Testing for Deep Learning Systems

RobOT: Robustness-Oriented Testing for Deep Learning Systems

Towards Exploring the Limitations of Active Learning: An Empirical Study

Ex ² : Monte Carlo Tree Search‐based test inputs prioritization for fuzzing deep neural networks

Contact Info

Product

Resources

About

DeepGini: prioritizing massive tests to enhance the robustness of deep neural networks

Cited by 133 publications

References 31 publications

RobOT: Robustness-Oriented Testing for Deep Learning Systems

RobOT: Robustness-Oriented Testing for Deep Learning Systems

Towards Exploring the Limitations of Active Learning: An Empirical Study

Ex 2 : Monte Carlo Tree Search‐based test inputs prioritization for fuzzing deep neural networks

Contact Info

Product

Resources

About

Ex ² : Monte Carlo Tree Search‐based test inputs prioritization for fuzzing deep neural networks