The growing application of deep neural networks in safety-critical domains makes the analysis of faults that occur in such systems of enormous importance. In this paper we introduce a large taxonomy of faults in deep learning (DL) systems. We have manually analysed 1059 artefacts gathered from GitHub commits and issues of projects that use the most popular DL frameworks (TensorFlow, Keras and PyTorch) and from related Stack Overflow posts. Structured interviews with 20 researchers and practitioners describing the problems they have encountered in their experience have enriched our taxonomy with a variety of additional faults that did not emerge from the other two sources. Our final taxonomy was validated with a survey involving an additional set of 21 developers, confirming that almost all fault categories (13/15) were experienced by at least 50% of the survey participants. CCS CONCEPTS • Software and its engineering → Software verification and validation.
With the increasing adoption of Deep Learning (DL) for critical tasks, such as autonomous driving, the evaluation of the quality of systems that rely on DL has become crucial. Once trained, DL systems produce an output for any arbitrary numeric vector provided as input, regardless of whether it is within or outside the validity domain of the system under test. Hence, the quality of such systems is determined by the intersection between their validity domain and the regions where their outputs exhibit a misbehaviour. In this paper, we introduce the notion of frontier of behaviours, i.e., the inputs at which the DL system starts to misbehave. If the frontier of misbehaviours is outside the validity domain of the system, the quality check is passed. Otherwise, the inputs at the intersection represent quality deficiencies of the system. We developed DeepJanus, a search-based tool that generates frontier inputs for DL systems. The experimental results obtained for the lane keeping component of a self-driving car show that the frontier of a well trained system contains almost exclusively unrealistic roads that violate the best practices of civil engineering, while the frontier of a poorly trained one includes many valid inputs that point to serious deficiencies of the system. CCS CONCEPTS • Software and its engineering → Software testing and debugging.
Context: A Machine Learning based System (MLS) is a software system including one or more components that learn how to perform a task from a given data set. The increasing adoption of MLSs in safety critical domains such as autonomous driving, healthcare, and finance has fostered much attention towards the quality assurance of such systems. Despite the advances in software testing, MLSs bring novel and unprecedented challenges, since their behaviour is defined jointly by the code that implements them and the data used for training them. Objective: To identify the existing solutions for functional testing of MLSs, and classify them from three different perspectives: (1) the context of the problem they address, (2) their features, and (3) their empirical evaluation. To report demographic information about the ongoing research. To identify open challenges for future research. Method: We conducted a systematic mapping study about testing techniques for MLSs driven by 33 research questions. We followed existing guidelines when defining our research protocol so as to increase the repeatability and reliability of our results. Results: We identified 70 relevant primary studies, mostly published in the last years. We identified 11 problems addressed in the literature. We investigated multiple aspects of the testing approaches, such as the used/proposed adequacy criteria, the algorithms for test input generation, and the test oracles. Conclusions: The most active research areas in MLS testing address automated scenario/input generation and test oracle creation. MLS testing is a rapidly growing and developing research area, with many open challenges, such as the generation of realistic inputs and the definition of reliable evaluation metrics and benchmarks.
No abstract
SummaryThis paper investigates the failures exposed in mobile apps by the mobile-specific event of changing the screen orientation. We focus on GUI failures resulting in unexpected GUI states that should be avoided to improve the apps quality and to ensure better user experience. We propose a classification framework that distinguishes 3 main classes of GUI failures due to orientation changes and exploit it in 2 studies that investigate the impact of such failures in Android apps. The studies involved both open-source and apps from Google Play that were specifically tested exposing them to orientation change events. The results showed that more than 88% of these apps were affected by GUI failures, some classes of GUI failures were more common than others, and some GUI objects were more frequently involved. The app source code analysis allowed us to identify 6 classes of common faults causing specific GUI failures. KEYWORDSAndroid bugs, Android testing, GUI failures, GUI testing, mobile testing, orientation change INTRODUCTIONOver the last decade, the number of users of mobile technology and smartphones has increased considerably. The total number of smartphone users worldwide is forecast to surpass 2.5 billion in 2019 [1].This causes a constant demand for new software applications running on these devices (apps). As of the month of June 2016, both Android and iOS users had the opportunity to choose from among more than 2 million apps [2].Mobile technology has radically changed the lifestyle of billions of people around the world. We use mobile apps for several hours every day, entrust them our sensitive data, and perform a large variety of activities through them, including critical tasks.The demand for app quality has grown together with their spread. Apps users require them to be reliable, robust, efficient, secure, usable, etc.As a consequence, software developers should give proper consideration to the quality of their applications by adopting suitable quality assurance techniques, such as testing.In the last decade, the research community has devoted great interest to the mobile app testing field. Several testing approaches have been proposed to assess different quality aspects of mobile applications [3], such as functionality [4], performance [5], security [6,7], responsiveness [8], and energy consumption [9].Since mobile apps are event-driven systems, many proposed techniques solicit them by means of sequences of events [10]. However, because of the peculiarities of the mobile devices, these apps should be tested with appositely crafted approaches [11]. As an example, testing processes should devote particular attention to exercise the apps through mobile-specific events, such as sending an application to the background and resuming it, receiving a call, changing the state of the network connections, or changing the orientation of the device.Among these types of events, the orientation change deserves special attention. It is a peculiar event in mobile platforms that causes the switch of the running app b...
Deep Learning (DL) has been successfully applied to a wide range of application domains, including safetycritical ones. Several DL testing approaches have been recently proposed in the literature but none of them aims to assess how different interpretable features of the generated inputs affect the system's behaviour. In this paper, we resort to Illumination Search to find the highest-performing test cases (i.e., misbehaving and closest to misbehaving), spread across the cells of a map representing the feature space of the system. We introduce a methodology that guides the users of our approach in the tasks of identifying and quantifying the dimensions of the feature space for a given domain. We developed DEEPHYPERION, a search-based tool for DL systems that illuminates, i.e., explores at large, the feature space, by providing developers with an interpretable feature map where automatically generated inputs are placed along with information about the exposed behaviours.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.