"Transfer learning": is the process of translating quality predictors learned in one data set to another. Transfer learning has been the subject of much recent research. In practice, that research means changing models all the time as transfer learners continually exchange new models to the current project. This paper offers a very simple "bellwether" transfer learner. Given N data sets, we find which one produces the best predictions on all the others. This "bellwether" data set is then used for all subsequent predictions (or, until such time as its predictions start failing-at which point it is wise to seek another bellwether). Bellwethers are interesting since they are very simple to find (just wrap a for-loop around standard data miners). Also, they simplify the task of making general policies in SE since as long as one bellwether remains useful, stable conclusions for N data sets can be achieved just by reasoning over that bellwether. From this, we conclude (1) this bellwether method is a useful (and very simple) transfer learning method; (2) "bellwethers" are a baseline method against which future transfer learners should be compared; (3) sometimes, when building increasingly complex automatic methods, researchers should pause and compare their supposedly more sophisticated method against simpler alternatives. CCS Concepts •Software and its engineering → Software creation and management;
Software analytics builds quality prediction models for software projects. Experience shows that (a) the more projects studied, the more varied are the conclusions; and (b) project managers lose faith in the results of software analytics if those results keep changing. To reduce this conclusion instability, we propose the use of "bellwethers": given N projects from a community the bellwether is the project whose data yields the best predictions on all others. The bellwethers offer a way to mitigate conclusion instability because conclusions about a community are stable as long as this bellwether continues as the best oracle. Bellwethers are also simple to discover (just wrap a for-loop around standard data miners). When compared to other transfer learning methods (TCA+, transfer Naive Bayes, value cognitive boosting), using just the bellwether data to construct a simple transfer learner yields comparable predictions. Further, bellwethers appear in many SE tasks such as defect prediction, effort estimation, and bad smell detection. We hence recommend using bellwethers as a baseline method for transfer learning against which future work should be compared.
Abstract-Increasingly, Software Engineering (SE) researchers use search-based optimization techniques to solve SE problems with multiple conflicting objectives. These techniques often apply CPU-intensive evolutionary algorithms to explore generations of mutations to a population of candidate solutions. An alternative approach, proposed in this paper, is to start with a very large population and sample down to just the better solutions. We call this method "SWAY ", short for "the sampling way". This paper compares SWAY versus state-of-the-art search-based SE tools using seven models: five software product line models; and two other software process control models (concerned with project management, effort estimation, and selection of requirements) during incremental agile development. For these models, the experiments of this paper show that SWAY is competitive with corresponding state-of-the-art evolutionary algorithms while requiring orders of magnitude fewer evaluations. Considering the simplicity and effectiveness of SWAY , we, therefore, propose this approach as a baseline method for search-based software engineering models, especially for models that are very slow to execute.
Actionable analytics are those that humans can understand, and operationalize. What kind of data mining models generate such actionable analytics? According to psychological scientists, humans understand models that most match their own internal models, which they characterize as lists of "heuristic" (i.e., lists of very succinct rules). One such heuristic rule generator is the Fast-and-Frugal Trees (FFT) preferred by psychological scientists. Despite their successful use in many applied domains, FFTs have not been applied in software analytics. Accordingly, this paper assesses FFTs for software analytics.We find that FFTs are remarkably effective. Their models are very succinct (5 lines or less describing a binary decision tree). These succinct models outperform state-of-the-art defect prediction algorithms defined by Ghortra et al. at ICSE'15. Also, when we restrict training data to operational attributes (i.e., those attributes that are frequently changed by developers), FFTs perform much better than standard learners.Our conclusions are two-fold. Firstly, there is much that software analytics community could learn from psychological science. Secondly, proponents of complex methods should always baseline those methods against simpler alternatives. For example, FFTs could be used as a standard baseline learner against which other software analytics tools are compared.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.