Threats to the validity of mutation-based test assessment

Papadakis, Mike; Henard, Christopher; Harman, Mark; Jia, Yunyi; Traon, Yves Le

doi:10.1145/2931037.2931040

Cited by 105 publications

(145 citation statements)

References 60 publications

Supporting

Mentioning

140

Contrasting

Order By: Relevance

“…An emerging question regards the optimal use of mutants when comparing testing methods, i.e., whether methods should be applied on the original (clean) or on the mutant versions of the programs. Similarly, the relation of specific kinds of mutants, such as the subsuming [56] and hard to kill [57] ones, with real faults and their actual contribution within the testing process form other important aspects that we plan to investigate.…”

Section: Discussionmentioning

confidence: 99%

An Empirical Study on Mutation, Statement and Branch Coverage Fault Revelation That Avoids the Unreliable Clean Program Assumption

Chekam

Papadakis

Traon

et al. 2017

2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)

Self Cite

107

109

View full text Add to dashboard Cite

Abstract-Many studies suggest using coverage concepts, such as branch coverage, as the starting point of testing, while others as the most prominent test quality indicator. Yet the relationship between coverage and fault-revelation remains unknown, yielding uncertainty and controversy. Most previous studies rely on the Clean Program Assumption, that a test suite will obtain similar coverage for both faulty and fixed ('clean') program versions. This assumption may appear intuitive, especially for bugs that denote small semantic deviations. However, we present evidence that the Clean Program Assumption does not always hold, thereby raising a critical threat to the validity of previous results. We then conducted a study using a robust experimental methodology that avoids this threat to validity, from which our primary finding is that strong mutation testing has the highest fault revelation of four widely-used criteria. Our findings also revealed that fault revelation starts to increase significantly only once relatively high levels of coverage are attained.

show abstract

Section: Discussionmentioning

confidence: 99%

An Empirical Study on Mutation, Statement and Branch Coverage Fault Revelation That Avoids the Unreliable Clean Program Assumption

Chekam

Papadakis

Traon

et al. 2017

2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)

Self Cite

107

109

View full text Add to dashboard Cite

show abstract

“…The actual differences are thin and are due to the selection procedure. Disjoint mutants are a subset with minimum joint killings, approximated through a greedy heuristic [1], [6]. Surface mutants [12] are also approximated by a similar heuristic.…”

Section: B Set-based Mqismentioning

confidence: 99%

“…We used the Codeflaws benchmark [13] that involves programs selected from an on-line programming contests 1 . In Codeflaws, every faulty program version is unique and has two instances, the 'faulty' and the 'fixed' one.…”

Section: A Programs and Faultsmentioning

confidence: 99%

“…Thus, it is likely that one can achieve a good mutation score by simply killing bad mutants and not the good ones. Unfortunately, this fact can have serious implications on the confidence inspired by mutation testing [1]. Therefore, a first finding is that the majority of the mutants are bad ones according to every quality indicator.…”

Section: A Prevalence Of Mutant Quality Indicator Categoriesmentioning

confidence: 99%

“…Naturally, in mutation testing the 'quality' of mutants plays a central role and can have major implications on the performed analysis. For instance, empirical studies may come to biased conclusions if they use all available mutants [1]. Similarly, the use of restrictive mutant sets may result in a much lower strength testing [2].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Mutant Quality Indicators

Papadakis

Chekam

Traon

2018

2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

Self Cite

View full text Add to dashboard Cite

Abstract-The question of which are the valuable mutants has received little attention in mutation testing literature. Naturally, the choice of mutants impacts the quality of the performed analysis and has the potential of changing the conclusions of empirical studies. To this end, we collect definitions related to mutant quality indicators and analyses their relations. We identify two classes of indicators, related to individual mutants (Fault Revealing, Subsuming, Hard-to-kill and Stubborn) and to mutant sets (disjoint/dominator and distinguished). We analyse a large set of mutants from 3,902 (real) faulty program versions, belonging to 40 fault classes, collected from an on-line programming contest. Our analysis categorises mutants as valuable, according to the studied quality indicators, profiles their types and examines the relations between them. Our results suggest that there is a large disagreement between the indicators and that the connection between mutant type, its quality and its ability to reveal faults is weak. Additionally, our paper reveal that the ability of mutants to uncover faults differs significantly across the different fault classes and that some mutant types are well linked (or completely disconnected) to specific fault classes.

show abstract

Dissimilarity‐based test case prioritization through data fusion

Huang

Towey

et al. 2022

Softw Pract Exp

View full text Add to dashboard Cite

Test case prioritization (TCP) aims at scheduling test case execution so that more important test cases are executed as early as possible. Many TCP techniques have been proposed, according to different concepts and principles, with dissimilarity-based TCP (DTCP) prioritizing tests based on the concept of test case dissimilarity: DTCP chooses the next test case from a set of candidates such that the chosen test case is farther away from previously selected test cases than the other candidates. DTCP techniques typically only use one aspect/granularity of the information or features from test cases to support the prioritization process. In this article, we adopt the concept of data fusion to propose a new family of DTCP techniques, data-fusion-driven DTCP (DDTCP), which attempts to use different information granularities for prioritizing test cases by dissimilarity. We performed an empirical study involving 30 versions of five subject programs, investigating the testing effectiveness and efficiency by comparing DDTCP against DTCP techniques that use a dissimilarity granularity. The experimental results show that not only does DDTCP have better fault-detection rates than single-granularity DTCP techniques, but it also appears to only incur similar prioritization costs. The results also show that DDTCP remains robust over multiple system releases.

show abstract

Threats to the validity of mutation-based test assessment

Cited by 105 publications

References 60 publications

An Empirical Study on Mutation, Statement and Branch Coverage Fault Revelation That Avoids the Unreliable Clean Program Assumption

An Empirical Study on Mutation, Statement and Branch Coverage Fault Revelation That Avoids the Unreliable Clean Program Assumption

Mutant Quality Indicators

Dissimilarity‐based test case prioritization through data fusion

Contact Info

Product

Resources

About