Proceedings of the 2015 International Symposium on Software Testing and Analysis 2015
DOI: 10.1145/2771783.2771791
|View full text |Cite
|
Sign up to set email alerts
|

An analysis of patch plausibility and correctness for generate-and-validate patch generation systems

Abstract: We analyze reported patches for three prior generate-andvalidate patch generation systems (GenProg, RSRepair, and AE). Because of experimental error, the majority of the reported patches violate the basic principle behind the design of these systems -they do not produce correct outputs even for the inputs in the test suite used to validate the patches. We also show that the overwhelming majority of the accepted patches are not correct and are equivalent to a single modification that simply deletes functionalit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

17
367
8
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 330 publications
(393 citation statements)
references
References 47 publications
(91 reference statements)
17
367
8
1
Order By: Relevance
“…This gives a nuanced picture of the results, which must however be taken-as usual-with a grain of salt: different tools may focus on achieving a better ranking vs. correctly fixing more bugs, and we do not imply that there is one universal measure of effectiveness. Anyway, our evaluation is widely applicable-including to papers that may not detail this aspect-and is in line with what done in other evaluations [14], [17], [25], [28], [29].…”
Section: E Threats To Validitysupporting
confidence: 73%
See 3 more Smart Citations
“…This gives a nuanced picture of the results, which must however be taken-as usual-with a grain of salt: different tools may focus on achieving a better ranking vs. correctly fixing more bugs, and we do not imply that there is one universal measure of effectiveness. Anyway, our evaluation is widely applicable-including to papers that may not detail this aspect-and is in line with what done in other evaluations [14], [17], [25], [28], [29].…”
Section: E Threats To Validitysupporting
confidence: 73%
“…We quantitatively compare JAID to all other available tools for APR of Java programs that have also used DEFECTS4J in their evaluations: 4 1) jGenProg is the implementation of GenProg [14], [33]-which works on C-for Java programs; we refer to jGenProg's evaluation in [19]; 2) jKali is the implementation of Kali [28]-which works on C-for Java programs; we refer to jKali's evaluation in [19]; 3) Nopol focuses on fixing Java conditional expression; we refer to Nopol's evaluation in [19]; 4) xPAR is a reimplementation of PAR [12]-which is not publicly available-discussed in [13] and [35]; 5) HDA implements the "history-driven" technique of [13]; 6) ACS implements the "precise condition synthesis" of [35].…”
Section: Setupmentioning
confidence: 99%
See 2 more Smart Citations
“…The second thing is that we strongly advice researchers evaluate the correctness rate of their automatic repair as well as fixing rate, no matter by human inspection or ground truth comparison in benchmark. Existing repair systems may fail to generate true patch due to test suite overfitting [61], [62], which is a concept in statistics or machine learning. Here overfitting means that the repair helps program perform well within test suite while fail in real usage.…”
Section: Impact Of Test Suite Qualitymentioning
confidence: 99%