2016
DOI: 10.1007/s11219-016-9339-1
|View full text |Cite
|
Sign up to set email alerts
|

Separating passing and failing test executions by clustering anomalies

Abstract: Developments in the automation of test data generation have greatly improved efficiency of the software testing process, but the so-called oracle problem (deciding the pass or fail outcome of a test execution) is still primarily an expensive and error-prone manual activity. We present an approach to automatically detect passing and failing executions using cluster-based anomaly detection on dynamic execution data based on firstly, just a system's input/output pairs and secondly, amalgamations of input/output p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
33
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 22 publications
(33 citation statements)
references
References 21 publications
0
33
0
Order By: Relevance
“…In an earlier study [2] we explored a range of clustering algorithms using either just test inputs and outputs, or inputs, outputs and execution traces, and found that small (less than average sized) clusters contained more than 60% of failures (and often a substantially higher proportion). Moreover, as well as having a higher failure density they also contained a spread of failures in the cases where there were multiple faults in the programs.…”
Section: Overview Of a Test Classification Strategymentioning
confidence: 99%
“…In an earlier study [2] we explored a range of clustering algorithms using either just test inputs and outputs, or inputs, outputs and execution traces, and found that small (less than average sized) clusters contained more than 60% of failures (and often a substantially higher proportion). Moreover, as well as having a higher failure density they also contained a spread of failures in the cases where there were multiple faults in the programs.…”
Section: Overview Of a Test Classification Strategymentioning
confidence: 99%
“…In this case derived oracles are commonly used to decrease the number of tests to manually examine or ease the validation. For example, existing tests can be used to generate more meaningful tests [25], similarity between executions can be used to pinpoint suspicious asserts [32], or clustering techniques can be used to group potentially faulty tests [1]. Moreover, if there are multiple versions from the implementation (e.g., regression testing [47] or di erent implementations for the same speci cation [29]), tests generated from one version could be executed on the other one.…”
Section: Related Workmentioning
confidence: 99%
“…us the question that motivated, and served as a basis of our research is the following: How do developers perform in using the tests generated from code to detect faults and decide whether the implementation is correct? 1 is question is mainly motivated by the fact that the actual fault-nding capability of white-box test generator tools could be much lower than reported in already 1 Note that if a test generated from a faulty implementation encodes a fault but passes, then the test can be considered faulty as well. erefore classifying the tests as faulty or correct could reveal a faulty implementation.…”
Section: Introductionmentioning
confidence: 99%
“…Previous work by the authors has explored the use of machine learning techniques to support the automatic classification of test outcomes as either passing or failing, and thereby providing a form of test oracle [1], [2], [3], but their relative performance, strengths and weaknesses have not been statistically analysed and compared with existing techniques. The aim of this study is to investigate and extensively evaluate these approaches to test oracle construction in terms of effectiveness when they are applied to medium-sized subject systems.…”
Section: Introductionmentioning
confidence: 99%
“…The aim of this study is to investigate and extensively evaluate these approaches to test oracle construction in terms of effectiveness when they are applied to medium-sized subject systems. The empirical evaluation in this paper can be summarised as follows: (1) statistical verification is implemented into two different sets of experimental results (in the first experiment, the input to the machine learning techniques consisted of just the test case inputs along with their associated outputs, and the second experiment extended this by adding to the input/output pairs their corresponding execution traces); (2) new results are presented that evaluate the effectiveness of our machine learning techniques by calculating the accuracy, recall, and the false positive rate; (3) a comparison between existing techniques from the specification mining domain (the data invariant detector Daikon [4]) and machine learning techniques is reported (Daikon was selected because it was the most effective oracle from a set of dynamic analysis techniques explored in a previous study [8]). The study is useful for testers because they need to be able to assess the features offered by these oracles, and also for the developers of oracle-based approaches to further understand the strengths and weaknesses of these different techniques and how they can be developed.…”
Section: Introductionmentioning
confidence: 99%