Support vector regression‐based imputation in analogy‐based software development effort estimation

Idri, Ali; Abnane, Ibtissam; Abran, Alain

doi:10.1002/smr.2114

Cited by 17 publications

(18 citation statements)

References 66 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• Parameter of fuzziness m controls the extent of sharing among fuzzy clusters. 24,48 In other words, low values of m mean that the project is more likely to belong to one cluster while high values mean that the project is more likely to belong to more clusters. Based on our previous study, 24 we range the parameter m from 1.5 to 3.5 with increments of 0.5.…”

Section: Empirical Designmentioning

confidence: 99%

“…49 The most commonly used approach to decide the optimal cluster number is executing the clustering algorithm several times with a different number of clusters and then selecting the cluster number that provides the best result according to a predefined criterion. 48 The predefined criterion function is called the cluster validity index.…”

Section: Empirical Designmentioning

confidence: 99%

“…Although the validity indexes are primarily used to decide upon the optimal number of clustering k, it can also be used to determine the optimal parameter of fuzziness m as well. 48 This study used the validity index of 49 in order to determine m, k, and β. Consequently, a grid search was performed, and the Tsekouras index was calculated for each configuration (m, k, and β). The optimal configuration is determined when the Tsekouras index reaches its lowest value.…”

Section: Empirical Designmentioning

confidence: 99%

See 2 more Smart Citations

Fuzzy case‐based‐reasoning‐based imputation for incomplete data in software engineering repositories

Abnane

Idri

Abran

2020

J Software Evolu Process

Self Cite

View full text Add to dashboard Cite

Missing data is a serious issue in software engineering because it can lead to information loss and bias in data analysis. Several imputation techniques have been proposed to deal with both numerical and categorical missing data. However, most of those techniques used is simple reuse techniques originally designed for numerical data, which is a problem when the missing data are related to categorical attributes. This paper aims (a) to propose a new fuzzy case‐based reasoning (CBR) imputation technique designed for both numerical and categorical data and (b) to evaluate and compare the performance of the proposed technique with the k‐nearest neighbor (KNN) imputation technique in terms of error and accuracy under different missing data percentages and missingness mechanisms in four software engineering data sets. The results suggest that the proposed fuzzy CBR technique outperformed KNN in terms of imputation error and accuracy regardless of the missing data percentage, missingness mechanism, and data set used. Moreover, we found that the missingness mechanism has an important impact on the performance of both techniques. The results are encouraging in the sense that using an imputation technique designed for both categorical and numerical data is better than reusing methods originally designed for numerical data.

show abstract

Section: Empirical Designmentioning

confidence: 99%

Section: Empirical Designmentioning

confidence: 99%

Section: Empirical Designmentioning

confidence: 99%

See 1 more Smart Citation

Fuzzy case‐based‐reasoning‐based imputation for incomplete data in software engineering repositories

Abnane

Idri

Abran

2020

J Software Evolu Process

Self Cite

View full text Add to dashboard Cite

show abstract

“…In this framework, the observed data are considered as a training set for the learning model, which is then applied to the data with missing values to impute. K-Nearest Neighbor (KNN) [ 17 ], Decision Tree (DT) [ 18 ] and Support Vector Regression (SVR) [ 19 ] are the most used ML techniques for imputation and achieved great success [ 20 ].…”

Section: Introductionmentioning

confidence: 99%

Comparing Statistical and Machine Learning Imputation Techniques in Breast Cancer Classification

Chlioui

Abnane

Idri

2020

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Missing data imputation is an important task when dealing with crucial data that cannot be discarded such as medical data. This study evaluates and compares the impacts of two statistical and two machine learning imputation techniques when classifying breast cancer patients, using several evaluation metrics. Mean, Expectation-Maximization (EM), Support Vector Regression (SVR) and K-Nearest Neighbor (KNN) were applied to impute 18% of missing data missed completely at random in the two Wisconsin datasets. Thereafter, we empirically evaluated these four imputation techniques when using five classifiers: decision tree (C4.5), Case Based Reasoning (CBR), Random Forest (RF), Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP). In total, 1380 experiments were conducted and the findings confirmed that classification using imputation based machine learning outperformed classification using statistical imputation. Moreover, our experiment showed that SVR was the best imputation method for breast cancer classification.

show abstract

“…Therefore, feature subset selection techniques should be applied to find the optimal set of features . Another important limitation of ASEE techniques is their inability to handle missing values …”

Section: Introductionmentioning

confidence: 99%

Analysis of cluster center initialization of 2FA‐kprototypes analogy‐based software effort estimation

Amazal

Idri²,

Abran

2019

J Software Evolu Process

Self Cite

View full text Add to dashboard Cite

Analogy‐based estimation is one of the most widely used techniques for effort prediction in software engineering. However, existing analogy‐based techniques suffer from an inability to correctly handle nonquantitative data. To deal with this limitation, a new technique called 2FA‐kprototypes was proposed and evaluated. 2FA‐kprototypes is based on the use of the fuzzy k‐prototypes clustering technique. Although fuzzy k‐prototypes algorithms are well known for their efficiency in clustering numerical and categorical data, they are sensitive to the selection of initial cluster centers. In this paper, the impact of cluster center initialization on improving the prediction accuracy of 2FA‐kprototypes was analyzed and discussed using two cluster initialization techniques: centrality‐based initialization and density‐based initialization. The performance of 2FA‐kprototypes using these two initialization techniques was evaluated and compared with that of 2FA‐kprototypes using random initialization over four datasets: ISBSG, COCOMO81, USP05‐FT, and USP05‐RQ. The results showed an improvement in the performance of 2FA‐kprototypes in terms of estimation accuracy when the all‐in method is used.

show abstract

Support vector regression‐based imputation in analogy‐based software development effort estimation

Cited by 17 publications

References 66 publications

Fuzzy case‐based‐reasoning‐based imputation for incomplete data in software engineering repositories

Fuzzy case‐based‐reasoning‐based imputation for incomplete data in software engineering repositories

Comparing Statistical and Machine Learning Imputation Techniques in Breast Cancer Classification

Analysis of cluster center initialization of 2FA‐kprototypes analogy‐based software effort estimation

Contact Info

Product

Resources

About