Regression testing is important but can be time-intensive. One approach to speed it up is regression test selection (RTS), which runs only a subset of tests. RTS was proposed over three decades ago but has not been widely adopted in practice. Meanwhile, testing frameworks, such as JUnit, are widely adopted and well integrated with many popular build systems. Hence, integrating RTS in a testing framework already used by many projects would increase the likelihood that RTS is adopted.We propose a new, lightweight RTS technique, called Ekstazi, that can integrate well with testing frameworks. Ekstazi tracks dynamic dependencies of tests on files, and unlike most prior RTS techniques, Ekstazi requires no integration with version-control systems. We implemented Ekstazi for Java and JUnit, and evaluated it on 615 revisions of 32 open-source projects (totaling almost 5M LOC) with shorter-and longer-running test suites. The results show that Ekstazi reduced the end-to-end testing time 32% on average, and 54% for longer-running test suites, compared to executing all tests. Ekstazi also has lower end-to-end time than the existing techniques, despite the fact that it selects more tests. Ekstazi has been adopted by several popular open source projects, including Apache Camel, Apache Commons Math, and Apache CXF.
We present an approach for describing tests using nondeterministic test generation programs. To write test generation programs, we introduce UDITA, a Java-based language with non-deterministic choice operators and an interface for generating linked structures. We also describe new algorithms that generate concrete tests by efficiently exploring the space of all executions of non-deterministic UDITA programs.We implemented our approach and incorporated it into the official, publicly available repository of Java PathFinder (JPF), a popular tool for verifying Java programs. We evaluate our technique by generating tests for data structures, refactoring engines, and JPF itself. Our experiments show that test generation using UDITA is faster and leads to test descriptions that are easier to write than in previous frameworks. Moreover, the novel execution mechanism of UDITA is essential for making test generation feasible. Using UDITA, we have discovered a number of previously unknown bugs in Eclipse, NetBeans, Sun javac, and JPF.
Abstract-Mutation testing is a powerful methodology for evaluating the quality of a test suite. However, the methodology is also very costly, as the test suite may have to be executed for each mutant. Selective mutation testing is a well-studied technique to reduce this cost by selecting a subset of all mutants, which would otherwise have to be considered in their entirety. Two common approaches are operator-based mutant selection, which only generates mutants using a subset of mutation operators, and random mutant selection, which selects a subset of mutants generated using all mutation operators. While each of the two approaches provides some reduction in the number of mutants to execute, applying either of the two to medium-sized, realworld programs can still generate a huge number of mutants, which makes their execution too expensive. This paper presents eight random sampling strategies defined on top of operatorbased mutant selection, and empirically validates that operatorbased selection and random selection can be applied in tandem to further reduce the cost of mutation testing. The experimental results show that even sampling only 5% of mutants generated by operator-based selection can still provide precise mutation testing results, while reducing the average mutation testing time to 6.54% (i.e., on average less than 5 minutes for this study).
A fundamental question in software testing research is how to compare test suites, often as a means for comparing testgeneration techniques. Researchers frequently compare test suites by measuring their coverage. A coverage criterion C provides a set of test requirements and measures how many requirements a given suite satisfies. A suite that satisfies 100% of the (feasible) requirements is C-adequate. Previous rigorous evaluations of coverage criteria mostly focused on such adequate test suites: given criteria C and C ′ , are C-adequate suites (on average) more effective than C ′ -adequate suites? However, in many realistic cases producing adequate suites is impractical or even impossible.We present the first extensive study that evaluates coverage criteria for the common case of non-adequate test suites: given criteria C and C ′
Regression testing is an important activity but can get expensive for large test suites. Test-suite reduction speeds up regression testing by identifying and removing redundant tests based on a given set of requirements. Traditional research on test-suite reduction is rather diverse but most commonly shares three properties: (1) requirements are defined by a coverage criterion such as statement coverage;(2) the reduced test suite has to satisfy all the requirements as the original test suite; and (3) the quality of the reduced test suites is measured on the software version on which the reduction is performed. These properties make it hard for test engineers to decide how to use reduced test suites.We address all three properties of traditional test-suite reduction: (1) we evaluate test-suite reduction with requirements defined by killed mutants; (2) we evaluate inadequate reduction that does not require reduced test suites to satisfy all the requirements; and (3) we propose evolution-aware metrics that evaluate the quality of the reduced test suites across multiple software versions. Our evaluations allow a more thorough exploration of trade-offs in test-suite reduction, and our evolution-aware metrics show how the quality of reduced test suites can change after the version where the reduction is performed. We compare the trade-offs among various reductions on 18 projects with a total of 261,235 tests over 3,590 commits and a cumulative history spanning 35 years of development. Our results help test engineers make a more informed decision about balancing size, coverage, and fault-detection loss of reduced test suites.
Regression test selection speeds up regression testing by rerunning only the tests that can be affected by the most recent code changes. Much progress has been made on research in automated test selection over the last three decades, but it has not translated into practical tools that are widely adopted. Therefore, developers either re-run all tests after each change or perform manual test selection. Re-running all tests is expensive, while manual test selection is tedious and error-prone. Despite such a big trade-off, no study assessed how developers perform manual test selection and compared it to automated test selection.This paper reports on our study of manual test selection in practice and our comparison of manual and automated test selection. We are the first to conduct a study that (1) analyzes data from manual test selection, collected in real time from 14 developers during a three-month study and (2) compares manual test selection with an automated state-of-the-research test-selection tool for 450 test sessions.Almost all developers in our study performed manual test selection, and they did so in mostly ad-hoc ways. Comparing manual and automated test selection, we found the two approaches to select different tests in each and every one of the 450 test sessions investigated. Manual selection chose more tests than automated selection 73% of the time (potentially wasting time) and chose fewer tests 27% of the time (potentially missing bugs). These results show the need for better automated test-selection techniques that integrate well with developers' programming environments.
Abstract. Container classes such as lists, sets, or maps are elementary data structures common to many programming languages. Since they are a part of standard libraries, they are important to test, which led to research on advanced testing techniques targeting such containers and research on comparing testing techniques using such containers. However, these techniques have not been thoroughly compared to simpler techniques such as random testing. We present the results of a larger case study in which we compare random testing with shape abstraction, a systematic technique that showed the best results in a previous study. Our experiments show that random testing is about as effective as shape abstraction for testing these containers, which raises the question whether containers are well suited as a benchmark for comparing advanced testing techniques.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2023 scite Inc. All rights reserved.
Made with 💙 for researchers