Spreadsheets used in companies often contain several thousand formulas. The localization of faulty cells in such large spreadsheets could be time-consuming and frustrating. Spectrum-based fault localization (SFL) supports users in faster locating the faulty cell(s). However, SFL depends on the information the user provides. In this paper, we address three research questions in this context: (RQ1) Do spreadsheets contain correct output cells that positively or negatively influence the ranking of the faulty cells? (RQ2) If yes, is it possible to a-priori determine which correct output cells would positively influence the ranking? (RQ3) Is it possible to avoid a decreasing fault localization quality when adding more correct output cells? This paper shows that there exist correct output cells which positively or negatively influence the ranking. In particular, correct output cells with the largest cones positively influence the ranking of the faulty cell. Balancing the relation of correct and erroneous output cells by duplicating the cones of erroneous output cells improves the fault localization quality.
I. INTRODUCTIONSpreadsheets are a well-known example of end-user programing. They are used in more than 95 % of the US companies [1], e.g., for financial reporting, data management, forward planning and investment decisions. Unfortunately, spreadsheets contain errors at an alarmingly high rate. Panko [2] estimates the human error rate when writing formulas to be 3 % to 5 %. Therefore, a spreadsheet with 100 formulas contains with a probability of 86.7 % to 99.4 % at least one fault. Even spreadsheets that build the basis for important decisions often contain faults. The European Spreadsheet Risk Interest Group (http://www.eusprig.org/horror-stories.htm) lists examples where spreadsheet faults led to financial losses.When an erroneous behavior is observed in a spreadsheet, the process of fault localization starts. Since spreadsheets that are used in companies are often large, containing several thousands of formulas, fault localization by manual inspection can be time-intensive and frustrating. In such a case, automatic fault localization techniques could help to automatically narrow down the search space.In previous work [3], we have discussed the application of spectrum-based fault localization (SFL) to spreadsheets.