Spreadsheet users are often unaware of the risks imposed by poorly designed spreadsheets. One way to assess spreadsheet quality is to detect smells which attempt to identify parts of spreadsheets that are hard to comprehend or maintain and which are more likely to be the root source of bugs. Unfortunately, current spreadsheet smell detection techniques suffer from a number of drawbacks that lead to incorrect or redundant smell reports. For example, the same quality issue is often reported for every copy of a cell, which may overwhelm users. To deal with these issues, we propose to refine spreadsheet smells by exploiting inferred structural information for smell detection. We therefore first provide a detailed description of our static analysis approach to infer clusters and blocks of related cells. We then elaborate on how to improve existing smells by providing three example refinements of existing smells that incorporate information about cell groups and computation blocks. Furthermore, we propose three novel smell detection techniques that make use of the inferred spreadsheet structures. Empirical evaluation of the proposed techniques suggests that the refinements successfully reduce the number of incorrectly and redundantly reported smells, and novel deficits are revealed by the newly introduced smells.
Spreadsheets are commonly used in organizations as a programming tool for business-related calculations and decision making. Since faults in spreadsheets can have severe business impacts, a number of approaches from general software engineering have been applied to spreadsheets in recent years, among them the concept of code smells. Smells can in particular be used for the task of fault prediction. An analysis of existing spreadsheet smells, however, revealed that the predictive power of individual smells can be limited. In this work we therefore propose a machine learning based approach which combines the predictions of individual smells by using an AdaBoost ensemble classi er. Experiments on two public datasets containing real-world spreadsheet faults show signi cant improvements in terms of fault prediction accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.