Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering 2015
DOI: 10.1145/2786805.2786825
|View full text |Cite
|
Sign up to set email alerts
|

Is the cure worse than the disease? overfitting in automated program repair

Abstract: Automated program repair has shown promise for reducing the significant manual effort debugging requires. This paper addresses a deficit of earlier evaluations of automated repair techniques caused by repairing programs and evaluating generated patches' correctness using the same set of tests. Since tests are an imperfect metric of program correctness, evaluations of this type do not discriminate between correct patches and patches that overfit the available tests and break untested but desired functionality. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

10
175
0

Year Published

2017
2017
2019
2019

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 253 publications
(185 citation statements)
references
References 67 publications
10
175
0
Order By: Relevance
“…This gives a nuanced picture of the results, which must however be taken-as usual-with a grain of salt: different tools may focus on achieving a better ranking vs. correctly fixing more bugs, and we do not imply that there is one universal measure of effectiveness. Anyway, our evaluation is widely applicable-including to papers that may not detail this aspect-and is in line with what done in other evaluations [14], [17], [25], [28], [29].…”
Section: E Threats To Validitysupporting
confidence: 73%
See 2 more Smart Citations
“…This gives a nuanced picture of the results, which must however be taken-as usual-with a grain of salt: different tools may focus on achieving a better ranking vs. correctly fixing more bugs, and we do not imply that there is one universal measure of effectiveness. Anyway, our evaluation is widely applicable-including to papers that may not detail this aspect-and is in line with what done in other evaluations [14], [17], [25], [28], [29].…”
Section: E Threats To Validitysupporting
confidence: 73%
“…A more detailed analysis [28] of the fixes produced by GenProg and similar techniques has shown that only a small fraction of them is genuinely correct; for example, less than 2% of the bugs of [14] are correctly fixed. [28]'s analysis has pushed the research in APR to addressing this manifestation of the overfitting problem [29].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The second thing is that we strongly advice researchers evaluate the correctness rate of their automatic repair as well as fixing rate, no matter by human inspection or ground truth comparison in benchmark. Existing repair systems may fail to generate true patch due to test suite overfitting [61], [62], which is a concept in statistics or machine learning. Here overfitting means that the repair helps program perform well within test suite while fail in real usage.…”
Section: Impact Of Test Suite Qualitymentioning
confidence: 99%
“…GenProg itself is derived from earlier work by Weimar et al [11,37,39]. Smith et al is an example of a more recent use of this framework [34]. Although GenProg is perhaps the most commonly known framework, there is also a large body of GI literature dedicated to automatic bug fixing [1,2,4,28,31] that use alternative approaches.…”
Section: Related Workmentioning
confidence: 99%