Several static analysis tools, such as Splint or FindBugs, have been proposed to the software development community to help detect security vulnerabilities or bad programming practices. However, the adoption of these tools is hindered by their high false positive rates. If the false positive rate is too high, developers may get acclimated to violation reports from these tools, causing concrete and severe bugs being overlooked. Fortunately, some violations are actually addressed and resolved by developers. We claim that those violations that are recurrently fixed are likely to be true positives, and an automated approach can learn to repair similar unseen violations. However, there is lack of a systematic way to investigate the distributions on existing violations and fixed ones in the wild, that can provide insights into prioritizing violations for developers, and an effective way to mine code and fix patterns which can help developers easily understand the reasons of leading violations and how to fix them.In this paper, we first collect and track a large number of fixed and unfixed violations across revisions of software. The empirical analyses reveal that there are discrepancies in the distributions of violations that are detected and those that are fixed, in terms of occurrences, spread and categories, which can provide insights into prioritizing violations. To automatically identify patterns in violations and their fixes, we propose an approach that utilizes convolutional neural networks to learn features and clustering to regroup similar instances. We then evaluate the usefulness of the identified fix patterns by applying them to unfixed violations. The results show that developers will accept and merge a majority (69/116) of fixes generated from the inferred fix patterns. It is also noteworthy that the yielded patterns are applicable to four real bugs in the Defects4J major benchmark for software testing and automated repair.
Automated program repair (APR) has extensively been developed by leveraging search-based techniques, in which fix ingredients are explored and identified in different granularities from a specific search space. State-of-the approaches often find fix ingredients by using mutation operators or leveraging manually-crafted templates. We argue that the fix ingredients can be searched in an online mode, leveraging code search techniques to find potentially-fixed versions of buggy code fragments from which repair actions can be extracted. In this study, we present an APR tool, LSRepair, that automatically explores code repositories to search for fix ingredients at the method-level granularity with three strategies of similar code search. Our preliminary evaluation shows that code search can drive a faster fix process (some bugs are fixed in a few seconds). LSRepair helps repair 19 bugs from the Defects4J benchmark successfully. We expect our approach to open new directions for fixing multiple-lines bugs.
One single code change can significantly influence a wide range of software systems and their users. For example, (a) adding a new feature can spread defects in several modules, while (b) changing an API method can improve the performance of all client programs. Unfortunately, developers often may not clearly know whether code changes are influential at commit time. This paper investigates influential software changes and proposes an approach to identify them immediately when they are applied. Our goals are to (a) identify existing influential changes (ICs) in software projects, (b) understand their characteristics, and (c) build a classification model of ICs to help developers find and address them early. We first conduct a post‐mortem analysis to discover existing influential changes by using intuitions (eg, changes referred by other changes). Then, we re‐categorize all identified changes through an open‐card sorting process. Subsequently, we conduct a survey with about 100 developers to finalize a taxonomy. Finally, from our ground truth, we extract features, including metrics such as the complexity of changes and file centrality in co‐change graphs to build machine learning classifiers. The experiment results show that our classification model with random samples achieves 86.8% precision, 74% recall, and 80.4% F‐measure, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.