Test-Driven Code Review: An Empirical Study

Spadini, Davide; Palomba, Fabio; Baum, Tobias; Hanenberg, Stefan; Bruntink, Magiel; Bacchelli, Alberto

doi:10.1109/icse.2019.00110

Cited by 28 publications

(19 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our participants revealed that finding how to design a test code to avoid flakiness is an important challenge to face. This motivates the growing research area around test code quality [18,29,[33][34][35][36][37] and provides two promising directions that the research community can focus on: (i) the definition of a set of design patterns that can support the creation of deterministic tests; (ii) the definition of a set of flakiness-related anti-patterns that practitioners should avoid when writing test cases. While some initial steps have been done about the relation between test smells and flaky tests [31,32], further investigation is necessary.…”

Section: Discussionmentioning

confidence: 99%

Understanding flaky tests: the developer’s perspective

Eck

Palomba

Castelluccio³

et al. 2019

Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of

Self Cite

116

101

View full text Add to dashboard Cite

Flaky tests are software tests that exhibit a seemingly random outcome (pass or fail) despite exercising unchanged code. In this work, we examine the perceptions of software developers about the nature, relevance, and challenges of flaky tests.We asked 21 professional developers to classify 200 flaky tests they previously fixed, in terms of the nature and the origin of the flakiness, as well as of the fixing effort. We also examined developers' fixing strategies. Subsequently, we conducted an online survey with 121 developers with a median industrial programming experience of five years. Our research shows that: The flakiness is due to several different causes, four of which have never been reported before, despite being the most costly to fix; flakiness is perceived as significant by the vast majority of developers, regardless of their team's size and project's domain, and it can have effects on resource allocation, scheduling, and the perceived reliability of the test suite; and the challenges developers report to face regard mostly the reproduction of the flaky behavior and the identification of the cause for the flakiness. Data and materials [https://doi.org/10.5281/zenodo.3265785]. CCS CONCEPTS• Software and its engineering → Software testing and debugging.

show abstract

Section: Discussionmentioning

confidence: 99%

Understanding flaky tests: the developer’s perspective

Eck

Palomba

Castelluccio³

et al. 2019

Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of

Self Cite

116

101

View full text Add to dashboard Cite

show abstract

“…Test code quality represents a multi-faceted concept able to express how useful a test will be for developers during the understanding of the production code [18], the debugging activities [19,20], and the early catching of defects [21]. Over the last decade, a number of researchers have been studying test code quality with the aim of defining metrics able to characterize it under different perspectives.…”

Section: Background and Related Workmentioning

confidence: 99%

Pizza versus Pinsa: On the Perception and Measurability of Unit Test Code Quality

Grano

Iaco

Palomba

et al. 2020

2020 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Self Cite

View full text Add to dashboard Cite

Test cases are an essential asset to evaluate software quality. The research community has provided various alternatives to help developers assessing the quality of tests, like code or mutation coverage. Despite the effort spent so far, however, little is known on how practitioners perceive unit test code quality and whether the existing metrics reflect their perception. This paper aims at addressing this gap of knowledge. We first conduct semi-structured interviews and surveys with practitioners to establish a taxonomy of relevant factors for unit test quality and collect a dataset of tests rated by developers based on their perceived quality. Then, we devise a statistical model to measure how the metrics available in literature reflect the perceived quality of test cases. The findings of our study show that readability and maintainability are the key aspects for developers to diagnose the outcome of test cases and drive debugging activities. On the contrary, code coverage metrics are necessary but not sufficient to evaluate the capability of tests. Finally, we discover that available metrics are effective in characterizing poor-quality tests, while limited in distinguishing high-quality ones.

show abstract

“…However, authors could not find substantial evidence on the influence of change part ordering on mental load or review performance. Spadini et al [46] designed and conducted a controlled experiment to investigate whether examining changed test code before the changed production code (also known as Test Driven Code Review or TDR) affects code review effectiveness. According to the findings of Spadini et al, developers adopting TDR find the same amount of defects in production code, but more defects in test code and fewer maintainability issues in the production code.…”

Section: Human Aspects In Modern Code Reviewmentioning

confidence: 99%

“…We consider the other variables as control variables, which also include the time spent on the review, the participant's role, years of experience in Java and Code Review, and tiredness. Finally, we run a logistic regression model similar to the one used by McIntoshet al [28] and Spadini et al [46]. To ensure that the selected logistic regression model is appropriate for the available data, we first (1) compute the Variance Inflation Factors (VIF) as a standard test for multicollinearity, finding all the values to be below 3 (values should be below 10), thus indicating little or no multicollinearity among the independent variables, (2) run a multilevel regression model to check whether there is a significant variance among reviewers, but we found little to none, thus indicating that a single level regression model is appropriate, and, finally, (4) when building the model we added the independent variables step-by-step and found that the coefficients remained stable, thus further indicating little to no interference among the variables.…”

Section: Variables and Measurement Detailsmentioning

confidence: 99%

See 1 more Smart Citation

Primers or reminders?

Spadini

Çalıklı

Bacchelli

2020

Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

Self Cite

View full text Add to dashboard Cite

In contemporary code review, the comments put by reviewers on a specific code change are immediately visible to the other reviewers involved. Could this visibility prime new reviewers' attention (due to the human's proneness to availability bias), thus biasing the code review outcome? In this study, we investigate this topic by conducting a controlled experiment with 85 developers who perform a code review and a psychological experiment. With the psychological experiment, we find that ≈70% of participants are prone to availability bias. However, when it comes to the code review, our experiment results show that participants are primed only when the existing code review comment is about a type of bug that is not normally considered; when this comment is visible, participants are more likely to find another occurrence of this type of bug. Moreover, this priming effect does not influence reviewers' likelihood of detecting other types of bugs. Our findings suggest that the current code review practice is effective because existing review comments about bugs in code changes are not negative primers, rather positive reminders for bugs that would otherwise be overlooked during code review.

show abstract

Test-Driven Code Review: An Empirical Study

Cited by 28 publications

References 44 publications

Understanding flaky tests: the developer’s perspective

Understanding flaky tests: the developer’s perspective

Pizza versus Pinsa: On the Perception and Measurability of Unit Test Code Quality

Primers or reminders?

Contact Info

Product

Resources

About