Correlations between deep neural network model coverage criteria and model quality

Yan, Shenao; Tao, Guanhong; Liu, Xuwei; Zhai, Jun; Ma, Shiqing; Xu, Lei; Zhang, Xiangyu

doi:10.1145/3368089.3409671

Cited by 55 publications

(58 citation statements)

References 64 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…d) Difference between Testing and Defect Detection: Recent paper [50] on correlations between coverage criteria and model quality suggests that coverage guided testing complements gradient-based adversarial attack. They discover that adversarial samples found by FNN coverage guided testing can be further utilised to retrain more robust models.…”

Section: H Threats To Validitymentioning

confidence: 99%

Coverage-Guided Testing for Recurrent Neural Networks

et al. 2022

View full text Add to dashboard Cite

Recurrent neural networks (RNNs) have been applied to a broad range of applications including natural language processing, drug discovery, and video recognition. However, their vulnerability to input perturbation is also exposed. Aligning with a view from software defect detection, this paper aims to develop a coverage guided testing approach to systematically exploit the internal behaviour of RNNs, with high possibility of detecting defects. Technically, the long short term memory network (LSTM), a major class of RNN, is thoroughly studied. A family of three test metrics are designed to quantify not only the values but also the temporal relations (including both step-wise and bounded-length) exhibited when LSTM processing inputs. A genetic algorithm is applied to efficiently generate test cases. Based on these, we develop a tool TESTRNN, and extensively evaluate TESTRNN on a set of LSTM benchmarks. Experiments confirm that TESTRNN has several advantages over the state-of-art tool DeepStellar and attack-based defect detection methods, owing to its working with finer temporal semantics and the consideration of the naturalness of input perturbation. Furthermore, TESTRNN enables meaningful information to be collected and exhibited for users to understand the testing results, which is an important step towards interpretable neural network testing.

show abstract

Section: H Threats To Validitymentioning

confidence: 99%

Coverage-Guided Testing for Recurrent Neural Networks

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Model testing has also been leveraged for many other domains such as image classification [79,89], automatic speech recognition [90], text classification [74], and machine translation [91,92]. Recently, Yan et al [93] have studied many coverage criteria and measured their correlations with model quality (i.e., model robustness against adversarial attacks), and empirical results show that existing criteria can not faithfully reflect model quality.…”

Section: Effects O F Configurable Parametersmentioning

confidence: 99%

AUTOTRAINER: An Automatic DNN Training Problem Detection and Repair System

Zhang

Zhai

et al. 2021

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Self Cite

View full text Add to dashboard Cite

With machine learning models especially Deep Neural Network (DNN) models becoming an integral part of the new intelligent software, new tools to support their engineering process are in high demand. Existing DNN debugging tools are either post-training which wastes a lot of time training a buggy model and requires expertises, or limited on collecting training logs without analyzing the problem not even fixing them. In this paper, we propose Au t o Tr a in e r , a DNN training monitoring and automatic repairing tool which supports detecting and autorepairing five commonly seen training problems. During training, it periodically checks the training status and detects potential problems. Once a problem is found, Au t o Tr a in e r tries to fix it by using built-in state-of-the-art solutions. It supports various model structures and input data types, such as Convolutional Neural Networks (CNNs) for image and Recurrent Neural Networks (RNNs) for texts. Our evaluation on 6 datasets, 495 models show that Au t o Tr a in e r can effectively detect all potential problems with 100% detection rate and no false positives. Among all models with problems, it can fix 97.33% of them, increasing the accuracy by 47.08% on average. Index Terms-software engineering, software tools, deep learning training I. In t r o d u c t io n

show abstract

“…DeepXplore [52] proposed a metric called neuron coverage for whitebox testing of DL models and leveraged gradient-based techniques to search for more effective tests. While various other metrics [39,43] have also been proposed recently, the correlation between such metrics and the robustness of models is still unclear [25,36,69]. Meanwhile, there are also a series of work targeting specific applications, such as autonomous driving, including DeepTest [63], DeepRoad [71], and DeepBillboard [78].…”

Section: Related Work DL Modelmentioning

confidence: 99%

“…Due to the popularity of DL models and the critical importance of their reliability, a growing body of research efforts have been dedicated to testing DL models, with focus on adversarial attacks [14,21,32,[46][47][48] for model robustness, the discussion on various metrics for DL model testing [36,39,43,52,69], and testing DL models for specific applications [63,71,78]. Meanwhile, both running and testing DL models inevitably involve the underlying DL libraries, which serve as central pieces of infrastructures for building, training, optimizing and deploying DL models.…”

Section: Introductionmentioning

confidence: 99%

Free Lunch for Testing: Fuzzing Deep-Learning Libraries from Open Source

Wei¹,

Yinlin²,

Yang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep learning (DL) systems can make our life much easier, and thus is gaining more and more attention from both academia and industry. Meanwhile, bugs in DL systems can be disastrous, and can even threaten human lives in safety-critical applications. To date, a huge body of research efforts have been dedicated to testing DL models. However, interestingly, there is still limited work for testing the underlying DL libraries, which are the foundation for building, optimizing, and running the DL models. One potential reason is that test generation for the underlying DL libraries can be rather challenging since their public APIs are mainly exposed in Python, making it even hard to automatically determine the API input parameter types due to dynamic typing. In this paper, we propose FreeFuzz, the first approach to fuzzing DL libraries via mining from open source. More specifically, FreeFuzz obtains code/models from three different sources: 1) code snippets from the library documentation, 2) library developer tests, and 3) DL models in the wild. Then, FreeFuzz automatically runs all the collected code/models with instrumentation to trace the dynamic information for each covered API, including the types and values of each parameter during invocation, and shapes of input/output tensors. Lastly, FreeFuzz will leverage the traced dynamic information to perform fuzz testing for each covered API. The extensive study of FreeFuzz on PyTorch and TensorFlow, two of the most popular DL libraries, shows that FreeFuzz is able to automatically trace valid dynamic information for fuzzing 1158 popular APIs, 9X more than state-of-the-art LEMON with 3.5X lower overhead than LEMON. Furthermore, FreeFuzz is able to detect 35 bugs for PyTorch and TensorFlow (with 31 confirmed by developers and 30 previously unknown).

show abstract

Correlations between deep neural network model coverage criteria and model quality

Cited by 55 publications

References 64 publications

Coverage-Guided Testing for Recurrent Neural Networks

Coverage-Guided Testing for Recurrent Neural Networks

AUTOTRAINER: An Automatic DNN Training Problem Detection and Repair System

Free Lunch for Testing: Fuzzing Deep-Learning Libraries from Open Source

Contact Info

Product

Resources

About