Reliably predicting software defects is one of the holy grails of software engineering. Researchers have devised and implemented a plethora of defect/bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches. We present a benchmark for defect prediction, in the form of a publicly available dataset consisting of several software systems, and provide an extensive comparison of well-known bug prediction approaches, together with novel approaches we devised. We evaluate the performance of the approaches using different performance indicators: classification of entities as defect-prone or not, ranking of the entities, with and without taking into account the effort to review an entity. We performed three sets of experiments aimed at (1) comparing the approaches across different systems, (2) testing whether the differences in performance are statistically significant, and (3) investigating the stability of approaches across different learners. Our results indicate that, while some approaches perform better than others in a statistically significant manner, external validity in defect prediction is still an open problem, as generalizing results to different contexts/learners proved to be a partially unsuccessful endeavor.
When the Application Programming Interface (API) of a framework or library changes, its clients must be adapted. This change propagation-known as a ripple effect-is a problem that has garnered interest: several approaches have been proposed in the literature to react to these changes. Although studies of ripple effects exist at the single system level, no study has been performed on the actual extent and impact of these API changes in practice, on an entire software ecosystem associated with a community of developers. This paper reports on an empirical study of API deprecations that led to ripple effects across an entire ecosystem. Our case study subject is the development community gravitating around the Squeak and Pharo software ecosystems: seven years of evolution, more than 3,000 contributors, and more than 2,600 distinct systems. We analyzed 577 methods and 186 classes that were deprecated, and answer research questions regarding the frequency, magnitude, duration, adaptation, and consistency of the ripple effects triggered by API changes.
E-mails concerning the development issues of a system constitute an important source of information about high-level design decisions, low-level implementation concerns, and the social structure of developers.Establishing links between e-mails and the software artifacts they discuss is a non-trivial problem, due to the inherently informal nature of human communication. Different approaches can be brought into play to tackle this traceability issue, but the question of how they can be evaluated remains unaddressed, as there is no recognized benchmark against which they can be compared.In this article we present such a benchmark, which we created through the manual inspection of a statistically significant number of e-mails pertaining to six unrelated software systems. We then use our benchmark to measure the effectiveness of a number of approaches, ranging from lightweight approaches based on regular expressions to full-fledged information retrieval approaches.
Code completion is a widely used productivity tool. It takes away the burden of remembering and typing the exact names of methods or classes: As a developer starts typing a name, it provides a progressively refined list of candidates matching the name. However, the candidate list always comes in alphabetic order, i.e., the environment is only second-guessing the name based on pattern matching. Finding the correct candidate can be cumbersome or slower than typing the full name.We present an approach to improve code completion with program history. We define a benchmark measuring the accuracy and usefulness of a code completion engine. Further, we use the change history data to also improve the results offered by code completion tools. Finally, we propose an alternative interface for completion tools.
Abstract-Static type systems play an essential role in contemporary programming languages. Despite their importance, whether static type systems influence human software development capabilities remains an open question. One frequently mentioned argument for static type systems is that they improve the maintainability of software systems-an often used claim for which there is no empirical evidence. This paper describes an experiment which tests whether static type systems improve the maintainability of software systems. The results show rigorous empirical evidence that static type are indeed beneficial to these activities, except for fixing semantic errors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.