Data analysts use computational notebooks to write code for analyzing and visualizing data. Notebooks help analysts iteratively write analysis code by letting them interleave code with output, and selectively execute cells. However, as analysis progresses, analysts leave behind old code and outputs, and overwrite important code, producing cluttered and inconsistent notebooks. This paper introduces code gathering tools, extensions to computational notebooks that help analysts fnd, clean, recover, and compare versions of code in cluttered, inconsistent notebooks. The tools archive all versions of code outputs, allowing analysts to review these versions and recover the subsets of code that produced them. These subsets can serve as succinct summaries of analysis activity or starting points for new analyses. In a qualitative usability study, 12 professional analysts found the tools useful for cleaning notebooks and writing analysis code, and discovered new ways to use them, like generating personal documentation and lightweight versioning. CCS CONCEPTS • Human-centered computing → Interactive systems and tools; • Software and its engineering → Development frameworks and environments.
Code analyzers such as Error Prone and FindBugs detect code patterns symptomatic of bugs, performance issues, or bad style. These tools express patterns as quick fixes that detect and rewrite unwanted code. However, it is difficult to come up with new quick fixes and decide which ones are useful and frequently appear in real code. We propose to rely on the collective wisdom of programmers and learn quick fixes from revision histories in software repositories. We present REVISAR, a tool for discovering common Java edit patterns in code repositories. Given code repositories and their revision histories, REVISAR (i) identifies code edits from revisions and (ii) clusters edits into sets that can be described using an edit pattern. The designers of code analyzers can then inspect the patterns and add the corresponding quick fixes to their tools. We ran REVISAR on nine popular GitHub projects, and it discovered 89 useful edit patterns that appeared in 3 or more projects. Moreover, 64% of the discovered patterns did not appear in existing tools. We then conducted a survey with 164 programmers from 124 projects and found that programmers significantly preferred eight out of the nine of the discovered patterns. Finally, we submitted 16 pull requests applying our patterns to 9 projects and, at the time of the writing, programmers accepted 6 (60%) of them. The results of this work aid toolsmiths in discovering quick fixes and making informed decisions about which quick fixes to prioritize based on patterns programmers actually apply in practice.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.