Background: This paper describes an analysisthat was conducted on newly collected repository with 92 versions of 38 proprietary, open-source and academic projects. A preliminary study perfomed before showed the need for a further in-depth analysis in order to identify project clusters. Aims: The goal of this research is to perform clustering on software projects in order to identify groups of software projects with similar characteristic from the defect prediction point of view. One defect prediction model should work well for all projects that belong to such group. The existence of those groups was investigated with statistical tests and by comparing the mean value of prediction efficiency. Method: Hierarchical and k-means clustering, as well as Kohonen's neural network was used to find groups of similar projects. The obtained clusters were investigated with the discriminant analysis. For each of the identified group a statistical analysis has been conducted in order to distinguish whether this group really exists. Two defect prediction models were created for each of the identified groups. The first one was based on the projects that belong to a given group, and the second one-on all the projects. Then, both models were applied to all versions of projects from the investigated group. If the predictions from the model based on projects that belong to the identified group are significantly better than the all-projects model (the mean values were compared and statistical tests were used), we conclude that the group really exists. Results: Six different clusters were identified and the existence of two of them was statistically proven: 1) cluster proprietary B-T=19, p=0.035, r=0.40; 2) cluster proprietary/open-t(17)=3.18, p=0.05, r=0.59. The obtained effect sizes (r) represent large effects according to Cohen's benchmark, which is a substantial finding. Conclusions: The two identified clusters were described and compared with results obtained by other researchers. The results of this work makes next step towards defining formal methods of reuse defect prediction models by identifying groups of projects within which the same defect prediction model may be used. Furthermore, a method of clustering was suggested and applied.
The knowledge about the software metrics which serve as defect indicators is vital for the efficient allocation of resources for quality assurance. It is the process metrics, although sometimes difficult to collect, which have recently become popular with regard to defect prediction. However, in order to identify rightly the process metrics which are actually worth collecting, we need the evidence validating their ability to improve the product metric-based defect prediction models. This paper presents an empirical evaluation in which several process metrics were investigated in order to identify the ones which significantly improve the defect prediction models based on product metrics. Data from a wide range of software projects (both, industrial and open source) were collected. The predictions of the models that use only product metrics (simple models) were compared with the predictions of the models which used product metrics, as well as one of the process metrics under scrutiny (advanced models). To decide whether the improvements were significant or not, statistical tests were performed and effect sizes were calculated. The advanced defect prediction models trained on a data set containing product metrics and additionally Number of Distinct Committers (NDC) were significantly better than the simple models without NDC, while the effect size was medium and the probability of superiority (PS) of the advanced models over simple ones was high (p ¼ :016, r ¼ À:29, PS ¼ :76), which is a substantial finding useful in defect prediction. A similar result with slightly smaller PS was achieved by the advanced models trained on a data set containing product metrics and additionally all of the investigated process metrics (p ¼ :038, r ¼ À:29, PS ¼ :68). The advanced models trained on a data set containing product metrics and additionally Number of Modified Lines (NML) were significantly better than the simple models without NML, but the effect size was small
Code review is a widely adopted means of improving software quality. There are numerous techniques of reviewing including tool‐assisted and over‐the‐shoulder. Both require significant effort to perform. However, the impact the over‐the‐shoulder has on review effectiveness remains unexplored. The authors goal was to compare the techniques and the effect of review size to analyse how that affects the effectiveness. They performed three experiments within one company. Changes provided by developers were reviewed twice by different reviewers, each time using a different technique. Afterwards, participants rated in a questionnaire the influence of each technique to the knowledge transfer. They observed that the number of accepted comments reported with the tool‐assisted technique is approximately twice as big. Also, they noticed a relationship between comments density and the review size: the correlations vary from − 0.42 to − 0.33. The knowledge transfer was typically evaluated as it supports knowledge sharing to a limited extent for the tool‐assisted technique but as it is good at supporting knowledge sharing for the over‐the‐shoulder technique. The results did not give a simple answer whether one technique outperforms the other. The tool‐assisted technique results in more comments. However, the teams evaluated the over‐the‐shoulder technique as better supporting knowledge transfer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.