Importance-driven deep learning system testing

Gerasimou, Simos; Enişer, Hasan Ferit; Şen, Alper; Cakan, Alper

doi:10.1145/3377812.3390793

Cited by 11 publications

(12 citation statements)

References 28 publications

(50 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…DeepXplore [83] introduced the neuron coverage metric to measures the percentage of activated neurons or a given test suite and DNN model, and generates new test inputs that can maximize the metric to test DL systems. Many others [70,[84][85][86][87][88] extended the coverage concept and proposed to use them on many different scenarios. Model testing has also been leveraged for many other domains such as image classification [79,89], automatic speech recognition [90], text classification [74], and machine translation [91,92].…”

Section: Effects O F Configurable Parametersmentioning

confidence: 99%

AUTOTRAINER: An Automatic DNN Training Problem Detection and Repair System

Zhang

Zhai

et al. 2021

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

View full text Add to dashboard Cite

With machine learning models especially Deep Neural Network (DNN) models becoming an integral part of the new intelligent software, new tools to support their engineering process are in high demand. Existing DNN debugging tools are either post-training which wastes a lot of time training a buggy model and requires expertises, or limited on collecting training logs without analyzing the problem not even fixing them. In this paper, we propose Au t o Tr a in e r , a DNN training monitoring and automatic repairing tool which supports detecting and autorepairing five commonly seen training problems. During training, it periodically checks the training status and detects potential problems. Once a problem is found, Au t o Tr a in e r tries to fix it by using built-in state-of-the-art solutions. It supports various model structures and input data types, such as Convolutional Neural Networks (CNNs) for image and Recurrent Neural Networks (RNNs) for texts. Our evaluation on 6 datasets, 495 models show that Au t o Tr a in e r can effectively detect all potential problems with 100% detection rate and no false positives. Among all models with problems, it can fix 97.33% of them, increasing the accuracy by 47.08% on average. Index Terms-software engineering, software tools, deep learning training I. In t r o d u c t io n

show abstract

Section: Effects O F Configurable Parametersmentioning

confidence: 99%

AUTOTRAINER: An Automatic DNN Training Problem Detection and Repair System

Zhang

Zhai

et al. 2021

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

View full text Add to dashboard Cite

show abstract

“…We select the state-of-the-art coverage criteria as the baselines for the evaluation, i.e., Neuron Coverage (NC) [39], 𝑘-Multisection Neuron Coverage (KMNC) [31], Neuron Boundary Coverage (NBC) [31], Likelihood-based Surprise Coverage (LSC) [31], Distance-based Surprise Coverage (DSC) [23] and Importance-Driven Coverage (IDC) [15]. Due to that the original IDC only supports Keras models, we implemented a PyTorch version of IDC for the comparison.…”

Section: Baselinesmentioning

confidence: 99%

“…The critical neurons between the layers form the CDP. Compared with the existing neuron-based coverage metric (e.g., NC [39], 𝑘-multisection Neuron Coverage [31], IDC [15]), CDP considers not only the critical neurons in one layer but also the relationships among layers. Moreover, the CDP is clearly related to decision-making.…”

Section: Introductionmentioning

confidence: 99%

NPC: Neuron Path Coverage via Characterizing Decision Logic of Deep Neural Networks

Xie¹,

Li²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep learning has recently been widely applied to many applications across different domains, e.g., image classification and audio recognition. However, the quality of Deep Neural Networks (DNNs) still raises concerns in the practical operational environment, which calls for systematic testing, especially in safetycritical scenarios. Inspired by software testing, a number of structural coverage criteria are designed and proposed to measure the test adequacy of DNNs. However, due to the blackbox nature of DNN, the existing structural coverage criteria are difficult to interpret, making it hard to understand the underlying principles of these criteria. The relationship between the structural coverage and the decision logic of DNNs is unknown. Moreover, recent studies have further revealed the non-existence of correlation between the structural coverage and DNN defect detection, which further posts concerns on what a suitable DNN testing criterion should be.In this paper, we propose the interpretable coverage criteria through constructing the decision structure of a DNN. Mirroring the control flow graph of the traditional program, we first extract a decision graph from a DNN based on its interpretation, where a path of the decision graph represents a decision logic of the DNN. Based on the control flow and data flow of the decision graph, we propose two variants of path coverage to measure the adequacy of the test cases in exercising the decision logic. The higher the path coverage, the more diverse decision logic the DNN is expected to be explored. Our large-scale evaluation results demonstrate that: the path in the decision graph is effective in characterizing the decision of the DNN, and the proposed coverage criteria are also sensitive with errors including natural errors and adversarial examples, and strongly correlated with the output impartiality.CCS Concepts: • Software and its engineering → Software testing and debugging; • Computing methodologies → Neural networks.

show abstract

“…[50,60,213]). One approach proposed a technique to select test cases based on a metric of importance [65], whereas others proposed techniques to identify corner cases [22], adversarial examples [204] or likely failure scenarios [107]. Finally, a few approaches proposed techniques for test input prioritization to select the most important ones and reduce the cost of labeling [34,54] or reduce the performance cost of training and testing huge amounts of data [186].…”

Section: Software Testing (115 Studies)mentioning

confidence: 99%

“…14 out of 20 focused on test coverage metrics (e.g. [22,75,190], whereas the rest of metrics were reported only by one study each: diversity [187], importance [65], suspiciousness [52], probability of sufficiency [36], and disagreement [216].…”

Section: Software Testing (115 Studies)mentioning

confidence: 99%

Software Engineering for AI-Based Systems: A Survey

Martínez-Fernández,

Bogner,

Franch

et al. 2021

Preprint

View full text Add to dashboard Cite

AI-based systems are software systems with functionalities enabled by at least one AI component (e.g., for image-and speechrecognition, and autonomous driving). AI-based systems are becoming pervasive in society due to advances in AI. However, there is limited synthesized knowledge on Software Engineering (SE) approaches for building, operating, and maintaining AI-based systems.To collect and analyze state-of-the-art knowledge about SE for AI-based systems, we conducted a systematic mapping study. We considered 248 studies published between January 2010 and March 2020. SE for AI-based systems is an emerging research area, where more than 2/3 of the studies have been published since 2018. The most studied properties of AI-based systems are dependability and safety. We identified multiple SE approaches for AI-based systems, which we classified according to the SWEBOK areas. Studies related to software testing and software quality are very prevalent, while areas like software maintenance seem neglected. Data-related issues are the most recurrent challenges. Our results are valuable for: researchers, to quickly understand the state of the art and learn which topics need more research; practitioners, to learn about the approaches and challenges that SE entails for AI-based systems; and, educators, to bridge the gap among SE and AI in their curricula. CCS Concepts: • Software and its engineering → Software creation and management; • Computing methodologies → Machine learning;

show abstract

Importance-driven deep learning system testing

Cited by 11 publications

References 28 publications

AUTOTRAINER: An Automatic DNN Training Problem Detection and Repair System

AUTOTRAINER: An Automatic DNN Training Problem Detection and Repair System

NPC: Neuron Path Coverage via Characterizing Decision Logic of Deep Neural Networks

Software Engineering for AI-Based Systems: A Survey

Contact Info

Product

Resources

About