Both the problem of class imbalance in datasets and parameter selection of Support Vector Machine (SVM) are crucial to predict software defects. However, there is no one working to solve these problems synchronously at present. To tackle this problem, a hybrid multi-objective cuckoo search under-sampled software defect prediction model based on SVM (HMOCS-US-SVM) is proposed to solve synchronously above two problems. Firstly, a hybrid multi-objective cuckoo search with dynamical local search (HMOCS) is utilized to select synchronously the non-defective sampling and optimize the parameters of SVM. Then, three under-sampled methods for decision region range are proposed to select the non-defective modules. In the simulation, the three indicators, including the false positive rate (pf), the probability of detection (pd), and G-mean, are employed to measure the performance of the proposed algorithm. In addition, eight datasets from Promise database are selected to verify the proposed software defect predication model.Comparing with the result of eight prediction models, the proposed method comes into effect on solving software defect prediction problem.
KEYWORDSclass imbalance, hybrid multi-objective cuckoo search, software defect prediction, SVM, under-sampled
INTRODUCTIONWith the advancement of network society, the software has been applied widely in the areas of life, such as the banking systems, biopharmaceutical engineering, and traffic signal command. Therefore, an increasing number of attention has been paid to the quality of software products. 1Generally speaking, software quality mainly includes five aspects: reliability, understandability, availability, maintainability, and effectiveness. 2 It is specially said that the reliability plays an important factor in leading to the software defects. 3Software defects are the errors in the software development, which will lead to faults, failure, collapse, and even endanger the safety of human life and property. 4 Therefore, how to find defects as much as possible is particularly important. The core of software defect prediction (SDP) 5 is to extract the characteristic attributes as the obvious defect tendency of the historical software module, so as to predict the type or number of defects in the new software projects.Class imbalance (CIB) in datasets is an unavoidable problem in SDP, which shows that 80% of the defects are concentrated on 20% of the modules. 6 However, the traditional classification algorithm 7 is built on the relative balance of datasets, which not suitable for imbalanced datasets. It does mean that the classification algorithm is more inclined to the non-defected module. 8 Therefore, how to alleviate the imbalance of datasets is a major problem in SDP. To tackle the CIB problem, the existing research can be roughly divided into cost-sensitive method, 9 ensemble method, 10 and sampling method. 11• Cost-sensitive algorithms 12 solve the imbalanced problems by modifying algorithms, which means that the method improves the accuracy of classificatio...