Summary
In the early phases of software testing, projects may have only limited historical defect data. Learning prediction model with such insufficient training data will limit the efficacy of learned predictor. In practice, there are usually many publicly available fault prediction datasets. Recently, heterogeneous fault prediction (HFP) has been proposed. However, existing HFP models do not investigate how to use mixed project data to predict target. Furthermore, defect data are often imbalanced. The imbalanced data distribution of source usually leads to serious misclassification of fault‐prone instances, which will degrade the predictor's performance. Existing HFP methods do not consider the class imbalance problem in the training stages. In this paper, we propose a novel Cost‐sensitive Label and Structure‐consistent Unilateral Projection (CLSUP) approach for HFP. CLSUP can not only make better use of the within‐project and cross‐project data but also alleviate the class imbalance problem by setting different misclassification costs for fault‐prone and non–fault‐prone instances. Extensive experiments on 30 projects demonstrate the effectiveness of CLSUP.