In construction of an effort estimation model, it seems effective to use a window of training data so that the model is trained with only recent projects. Considering the chronological order of projects within the window, and weighting projects according to their order within the window, may also affect estimation accuracy. In this study, we examined the effects of weighted moving windows on effort estimation accuracy. We compared weighted and non-weighted moving windows under the same experimental settings. We confirmed that weighting methods significantly improved estimation accuracy in larger windows, although the methods also significantly worsened accuracy in smaller windows. This result contributes to understanding properties of moving windows.
Context: Recent studies have shown that estimation accuracy can be affected by only using a window of recent projects as training data for building an effort estimation model. The idea has been extended for regression-based estimation by weighting projects differently according to their order within the window. This significantly improved the accuracy of estimation in a single-company dataset from the ISBSG repository. Objective: To investigate the effects on estimation accuracy of using weighted moving windows with a new dataset, and compare results across datasets. Method: Using a dataset drawn from the Finnish dataset (studied previously with regard to windows but not with weighting), and using a fixed-size window policy, we examine the effect on estimation accuracy of using weighted moving windows. Results: The use of weighting functions could improve the estimation accuracy significantly, compared to using unweighted windows, with larger window sizes. The steepness of the weighting functions affects their effectiveness. However, in this dataset it is better to use a growing portfolio (retaining all past projects as training data) than to use windows.
Conclusions:The results reinforce previous studies: the use of weighting functions can significantly improve the accuracy of regression-based estimation, compared to not using weighting, but in this dataset the use of moving windows reduces estimation accuracy.
To predict software quality, we must consider various factors because software development consists of various activities, which the software reliability growth model (SRGM) does not consider.In this paper, we propose a model to predict the final quality of a software product by using the Bayesian belief network (BBN) model. By using the BBN, we can construct a prediction model that focuses on the structure of the software development process explicitly representing complex relationships between metrics, and handling uncertain metrics, such as residual faults in the software products. In order to evaluate the constructed model, we perform an empirical experiment based on the metrics data collected from development projects in a certain company. As a result of the empirical evaluation, we confirm that the proposed model can predict the amount of residual faults that the SRGM cannot handle.
Context: Cross-project defect prediction (CPDP) research has been popular. One of the techniques for CPDP is a relevancy filter which utilizes clustering algorithms to select a useful subset of the cross-project data. Their performance heavily relies on the quality of clustering, and using an advanced clustering algorithm instead of simple ones used in the past studies can contribute to the performance improvement. Objective: To propose and examine a new relevancy filter method using an advanced clustering method DBSCAN (Density-Based Spatial Clustering). Method: We conducted an experiment that examined the predictive performance of the proposed method. The experiments compared three relevancy filter methods, namely, Burak-filter, Peters-filter, and the proposed method with 56 project data and four prediction models. Results: The predictive performance measures supported the proposed method. It was better than Burak-filter and Peters-filter in terms of AUC and g-measure. Conclusion: The proposed method achieved better prediction than the conventional methods. The results suggested that exploring advanced clustering algorithms could contribute to cross-project defect prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.