AbstractÐThe need for accurate software prediction systems increases as software becomes much larger and more complex. A variety of techniques have been proposed; however, none has proven consistently accurate and there is still much uncertainty as to what technique suits which type of prediction problem. We believe that the underlying characteristicsÐsize, number of features, type of distribution, etc.Ðof the data set influence the choice of the prediction system to be used. For this reason, we would like to control the characteristics of such data sets in order to systematically explore the relationship between accuracy, choice of prediction system, and data set characteristic. Also, in previous work, it has proven difficult to obtain significant results over small data sets. Consequently, it would be useful to have a large validation data set. Our solution is to simulate data allowing both control and the possibility of large (1,000) validation cases. In this paper, we compared four prediction techniques: regression, rule induction, nearest neighbor (a form of case-based reasoning), and neural nets. The results suggest that there are significant differences depending upon the characteristics of the data set. Consequently, researchers should consider prediction context when evaluating competing prediction systems. We also observed that the more ªmessyº the data and the more complex the relationship with the dependent variable, the more variability in the results. In the more complex cases, we observed significantly different results depending upon the particular training set that has been sampled from the underlying data set. This suggests that researchers will need to exercise caution when comparing different approaches and utilize procedures such as bootstrapping in order to generate multiple samples for training purposes. However, our most important result is that it is more fruitful to ask which is the best prediction system in a particular context rather than which is the ªbestº prediction system.
???The original publication is available at www.springerlink.com???. Copyright Springer. [Full text of this article is not available in the UHRA]The Cognitive Dimensions of Notations framework has been created to assist the designers of notational systems and information artifacts to evaluate their designs with respect to the impact that they will have on the users of those designs. The framework emphasizes the design choices available to such designers, including characterization of the user???s activity, and the inevitable tradeoffs that will occur between potential design options. The resulting framework has been under development for over 10 years, and now has an active community of researchers devoted to it. This paper first introduces Cognitive Dimensions. It then summarizes the current activity, especially the results of a one-day workshop devoted to Cognitive Dimensions in December 2000, and reviews the ways in which it applies to the field of Cognitive Technology
Traditionally, researchers have used either o-the-shelf models such as COCOMO, or developed local models using statistical techniques such as stepwise regression, to obtain software eort estimates. More recently, attention has turned to a variety of machine learning methods such as arti®cial neural networks (ANNs), case-based reasoning (CBR) and rule induction (RI). This paper outlines some comparative research into the use of these three machine learning methods to build software eort prediction systems. We brie¯y describe each method and then apply the techniques to a dataset of 81 software projects derived from a Canadian software house in the late 1980s. We compare the prediction systems in terms of three factors: accuracy, explanatory value and con®gurability. We show that ANN methods have superior accuracy and that RI methods are least accurate. However, this view is somewhat counteracted by problems with explanatory value and con®gurability. For example, we found that considerable eort was required to con®gure the ANN and that this compared very unfavourably with the other techniques, particularly CBR and least squares regression (LSR). We suggest that further work be carried out, both to further explore interaction between the enduser and the prediction system, and also to facilitate con®guration, particularly of ANNs. Ó
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.