Transparent combination of expert and measurement data for defect prediction

Kläs, Michael; Elberzhager, Frank; Münch, Jürgen; Hartjes, Klaus; Graevemeyer, Olaf von

doi:10.1145/1810295.1810313

Cited by 27 publications

(17 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One alternative would be 10-fold cross validation, but it essentially uses 90% training data and 10% test data. Moreover, random split is widely used in the literature [5], [9], [10].…”

Section: ) Within-project Predictionmentioning

confidence: 99%

“…Most approaches employ machine learning classifiers to build a prediction model from data sets mined from software repositories, and the model is used to identify software defects. However, most approaches are evaluated in within-project settings, i.e., a prediction model is built from a part of a project and the model is evaluated with the remainder of the project by 10-fold cross validation [2], [6], [7], [8] and/or random instance splits [5], [9], [10].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Transfer defect learning

Nam

Pan

Kim

2013

2013 35th International Conference on Software Engineering (ICSE)

424

411

View full text Add to dashboard Cite

Many software defect prediction approaches have been proposed and most are effective in within-project prediction settings. However, for new projects or projects with limited training data, it is desirable to learn a prediction model by using sufficient training data from existing source projects and then apply the model to some target projects (cross-project defect prediction). Unfortunately, the performance of crossproject defect prediction is generally poor, largely because of feature distribution differences between the source and target projects.In this paper, we apply a state-of-the-art transfer learning approach, TCA, to make feature distributions in source and target projects similar. In addition, we propose a novel transfer defect learning approach, TCA+, by extending TCA. Our experimental results for eight open-source projects show that TCA+ significantly improves cross-project prediction performance.Index Terms-cross-project defect prediction, transfer learning, empirical software engineering I. INTRODUCTION Recently, numerous effective software defect prediction approaches have been proposed and received a tremendous amount of attention [1], [2], [3], [4], [5]. Most approaches employ machine learning classifiers to build a prediction model from data sets mined from software repositories, and the model is used to identify software defects. However, most approaches are evaluated in within-project settings, i.e., a prediction model is built from a part of a project and the model is evaluated with the remainder of the project by 10-fold cross validation [2], [6], [7], [8] and/or random instance splits [5], [9], [10].In practice, cross-project defect prediction is necessary. New projects often do not have enough defect data to build a prediction model. This cold-start is a well-known problem for recommender systems [11] and can be addressed by using cross-project defect prediction to build a prediction model using data from other projects. The model is then applied to new projects.However, cross-project defect prediction often yields poor performance. Zimmermann et al.[12] evaluated cross-project defect prediction performance based on data from 12 projects (622 combinations). They found that only 21 pairs yielded reasonable prediction performance.One of the main reasons for the poor cross-project prediction performance is the difference between the data distributions of source and target projects. Most machine learning classifiers are designed under the assumption that training and

show abstract

“…One alternative would be 10-fold cross validation, but it essentially uses 90% training data and 10% test data. Moreover, random split is widely used in the literature [5], [9], [10].…”

Section: ) Within-project Predictionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Transfer defect learning

Nam

Pan

Kim

2013

2013 35th International Conference on Software Engineering (ICSE)

424

411

View full text Add to dashboard Cite

show abstract

“…We use 50:50 random splits, which are widely used in the evaluation of defect prediction models [22,37,41]. For the 50:50 random splits, we use one half of the instances for training a model and the rest for test (round 1).…”

Section: Experimental Designmentioning

confidence: 99%

Heterogeneous defect prediction

Nam

Kim

2015

Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

120

View full text Add to dashboard Cite

Software defect prediction is one of the most active research areas in software engineering. We can build a prediction model with defect data collected from a software project and predict defects in the same project, i.e. within-project defect prediction (WPDP). Researchers also proposed crossproject defect prediction (CPDP) to predict defects for new projects lacking in defect data by using prediction models built by other projects. In recent studies, CPDP is proved to be feasible. However, CPDP requires projects that have the same metric set, meaning the metric sets should be identical between projects. As a result, current techniques for CPDP are difficult to apply across projects with heterogeneous metric sets.To address the limitation, we propose heterogeneous defect prediction (HDP) to predict defects across projects with heterogeneous metric sets. Our HDP approach conducts metric selection and metric matching to build a prediction model between projects with heterogeneous metric sets. Our empirical study on 28 subjects shows that about 68% of predictions using our approach outperform or are comparable to WPDP with statistical significance.

show abstract

“…In our study, code characteristics are chosen independent of specific hardware description languages, which is listed in Table I. 2) History Characteristics: History information, such as past changes, fixes, bugs, and so on, may also have significant impacts on bug occurrence. In the domain of software engineering, history information has already been demonstrated to be helpful in predicting software defects [20]- [22], [26], [31], [42]. Hence, the proposed pre-silicon bug forecast framework also utilizes history information.…”

Section: ) Code Characteristicsmentioning

confidence: 99%

“…In the field of software engineering, many studies have been dedicated to characterize the relationship between the software characteristics and fault-proneness to assess the design quality [5], [19]- [22], [26], [27], [30], [31], [42], which mainly focused on selecting characteristics that have most impacts on the fault-proneness of software.…”

Section: B Defect Prediction In Software Engineeringmentioning

confidence: 99%

Pre-Silicon Bug Forecast

Guo

Chen

Wang³

et al. 2014

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

Abstract-The ever-intensifying time-to-market pressure imposes great challenges on the pre-silicon design phase of hardware. Before the tape-out, a pre-silicon design has to be thoroughly inspected by time-consuming functional verification and code review to exclude bugs. For functional verification and code review, a critical issue determining their efficiency is the allocation of resources (e.g., computational resources and manpower) to different modules of a design, which is conventionally guided by designers' experiences. Such practices, though simple and straightforward, may take high risks of wasting resources on bugfree modules or missing bugs in buggy modules, and thus could affect the success and timeline of the tape-out. In this paper, we propose a novel framework called pre-silicon bug forecast to predict the bug information of hardware designs. In this framework, bug models are built via machine learning techniques to characterize the relationship between design characteristics and the bug information, which can be leveraged to predict how bugs distribute in different modules of the current design. Such predicted bug information is adequate to regulate the resources among different modules to achieve efficient functional verification and code review. To evaluate the effectiveness of the proposed pre-silicon bug forecast framework, we conducted detailed experiments on several open-source hardware projects. Moreover, we also investigate the impacts of different learning techniques and different sets of characteristic on the performance of bug models. Experimental results show that with appropriate learning techniques and characteristics, about 90% modules could be correctly predicted as buggy or clean and the number of bugs of each module could also be accurately predicted.

show abstract

Transparent combination of expert and measurement data for defect prediction

Cited by 27 publications

References 24 publications

Transfer defect learning

Transfer defect learning

Heterogeneous defect prediction

Pre-Silicon Bug Forecast

Contact Info

Product

Resources

About