A fundamental question in software testing research is how to compare test suites, often as a means for comparing testgeneration techniques. Researchers frequently compare test suites by measuring their coverage. A coverage criterion C provides a set of test requirements and measures how many requirements a given suite satisfies. A suite that satisfies 100% of the (feasible) requirements is C-adequate. Previous rigorous evaluations of coverage criteria mostly focused on such adequate test suites: given criteria C and C ′ , are C-adequate suites (on average) more effective than C ′ -adequate suites? However, in many realistic cases producing adequate suites is impractical or even impossible.We present the first extensive study that evaluates coverage criteria for the common case of non-adequate test suites: given criteria C and C ′
Abstract. Container classes such as lists, sets, or maps are elementary data structures common to many programming languages. Since they are a part of standard libraries, they are important to test, which led to research on advanced testing techniques targeting such containers and research on comparing testing techniques using such containers. However, these techniques have not been thoroughly compared to simpler techniques such as random testing. We present the results of a larger case study in which we compare random testing with shape abstraction, a systematic technique that showed the best results in a previous study. Our experiments show that random testing is about as effective as shape abstraction for testing these containers, which raises the question whether containers are well suited as a benchmark for comparing advanced testing techniques.
A fundamental question in software testing research is how to compare test suites, often as a means for comparing test-generation techniques that produce those test suites. Researchers frequently compare test suites by measuring their coverage . A coverage criterion C provides a set of test requirements and measures how many requirements a given suite satisfies. A suite that satisfies 100% of the feasible requirements is called C-adequate . Previous rigorous evaluations of coverage criteria mostly focused on such adequate test suites: given two criteria C and C ′, are C -adequate suites on average more effective than C ′-adequate suites? However, in many realistic cases, producing adequate suites is impractical or even impossible. This article presents the first extensive study that evaluates coverage criteria for the common case of non-adequate test suites: given two criteria C and C ′, which one is better to use to compare test suites? Namely, if suites T 1 , T 2 ,…, T n have coverage values c 1 , c 2 ,…, c n for C and c 1 ′, c 2 ′,…, c n ′ for C ′, is it better to compare suites based on c 1 , c 2 ,…, c n or based on c 1 ′, c 2 ′,…, c n ′ ? We evaluate a large set of plausible criteria, including basic criteria such as statement and branch coverage, as well as stronger criteria used in recent studies, including criteria based on program paths, equivalence classes of covered statements, and predicate states. The criteria are evaluated on a set of Java and C programs with both manually written and automatically generated test suites. The evaluation uses three correlation measures. Based on these experiments, two criteria perform best: branch coverage and an intraprocedural acyclic path coverage. We provide guidelines for testing researchers aiming to evaluate test suites using coverage criteria as well as for other researchers evaluating coverage criteria for research use.
Regression test selection analyzes incremental changes to a codebase and chooses to run only those tests whose behavior may be affected by the latest changes in the code. By focusing on a small subset of all the tests, the testing process runs faster and can be more tightly integrated into the development process. Existing techniques for regression test selection consider two versions of the code at a time, effectively assuming a development process where changes to the code occur in a linear sequence.Modern development processes that use distributed version-control systems are more complex. Software version histories are generally modeled as directed graphs; in addition to version changes occurring linearly, multiple versions can be related by other commands, e.g., branch, merge, rebase, cherry-pick, revert, etc. This paper describes a regression test-selection technique for software developed using modern distributed version-control systems. By modeling different branch or merge commands directly in our technique, it computes safe test sets that can be substantially smaller than applying previous techniques to a linearization of the software history.We evaluate our technique on software histories of several large opensource projects. The results are encouraging: our technique obtained an average of 10.89× reduction in the number of tests over an existing technique while still selecting all tests whose behavior may differ. 1 We use the term "version" for what version-control systems often call "revision".
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.