Effect of Data Preprocessing on Software Effort Estimation

Sehra, Sumeet Kaur; Kaur, Jasneet; Sehra, Sukhjit Singh

doi:10.5120/12130-8506

Cited by 4 publications

(5 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Some works [17,19,37] employ LD or PD technique to tackle the missing data problem. In [20,26,21], the researchers concluded that the imputation strategy is more helpful for improving the estimation performance as compared with deletion and ignoring strategies.…”

Section: Solutions For Missing Data Problem In Seementioning

confidence: 99%

“…Considering the noisy, redundant, or unreliable information in dataset, like in [17], we employ the z-score normalization [56] to preprocess data. For a variable x with mean  and standard deviation  , the normalized variable using z-score normalization can be represented as:…”

Section: Data Setmentioning

confidence: 99%

“…Listwise deletion (LD) excludes the samples which contain some missing values [19]. For each variable, pairwise deletion [17] (PD), i.e., the ignoring strategy, removes the specific missing values corresponding to the variable. As compared with LD, PD can preserve more information [29].…”

Section: Solutions For Missing Data Problemmentioning

confidence: 99%

“…To solve the effort data missing problem, the deletion [17][18], ignoring [19], or imputation strategies are usually used. The imputation strategy was found to be more helpful for improving the estimation performance [20][21].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Missing data imputation based on low-rank recovery and semi-supervised regression for software effort estimation

Jing

et al. 2016

Proceedings of the 38th International Conference on Software Engineering

View full text Add to dashboard Cite

Software effort estimation (SEE) is a crucial step in software development. Effort data missing usually occurs in real-world data collection. Focusing on the missing data problem, existing SEE methods employ the deletion, ignoring, or imputation strategy to address the problem, where the imputation strategy was found to be more helpful for improving the estimation performance. Current imputation methods in SEE use classical imputation techniques for missing data imputation, yet these imputation techniques have their respective disadvantages and might not be appropriate for effort data. In this paper, we aim to provide an effective solution for the effort data missing problem. Incompletion includes the drive factor missing case and effort label missing case. We introduce the low-rank recovery technique for addressing the drive factor missing case. And we employ the semi-supervised regression technique to perform imputation in the case of effort label missing. We then propose a novel effort data imputation approach, named low-rank recovery and semisupervised regression imputation (LRSRI). Experiments on 7 widely used software effort datasets indicate that: (1) the proposed approach can obtain better effort data imputation effects than other methods; (2) the imputed data using our approach can apply to multiple estimators well.

show abstract

Section: Solutions For Missing Data Problem In Seementioning

confidence: 99%

Section: Data Setmentioning

confidence: 99%

Section: Solutions For Missing Data Problemmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Missing data imputation based on low-rank recovery and semi-supervised regression for software effort estimation

Jing

et al. 2016

Proceedings of the 38th International Conference on Software Engineering

View full text Add to dashboard Cite

show abstract

“…Pada software metric terdapat atribut sebagai skala pengukuran perangkat lunak yaitu: memberikan pemahaman dan mampu dibaca oleh setiap anggota pengembang [15]. Keempat parameter ini akan diklasifikasikan dengan algoritma K-Nearest Neighboor sehingga didapat mana yang berpengaruh dalam pengujian dan pengukuran perangkat lunak agar bebas cacat.…”

Section: Iunclassified

Analisa Studi Empirik Kerangka Kerja Pengukuran Kualitas Perangkat Lunak Bebas Cacat

Pamuji

2018

JPIT

View full text Add to dashboard Cite

Testing activitiy is a strategic step to determine software quality was generated, so that is accepted by the end user. In the testing an errors were found that may be cause to risk a defect on the software. This study was conducted by establishing a measurement framework to analyze software metrics test toward risk prediction of defects consisting of defect density, defect removal, and Line of code. In the analysis, the data set contains 53 module samples through a statistical approach with correlation analysis techniques. Based on the hypothesis were proposed, that there are only 2 of 3 items is received and shows a high significance of defect density and removal of defects towards software quality measurement.

show abstract

Automatic Cost Estimation Analysis on Datawarehouse Project with Modified Analogy Based Method

Pratama

Rasywir

2018

2018 International Conference on Electrical Engineering and Computer Science (ICECOS)

View full text Add to dashboard Cite

Effect of Data Preprocessing on Software Effort Estimation

Cited by 4 publications

References 6 publications

Missing data imputation based on low-rank recovery and semi-supervised regression for software effort estimation

Missing data imputation based on low-rank recovery and semi-supervised regression for software effort estimation

Analisa Studi Empirik Kerangka Kerja Pengukuran Kualitas Perangkat Lunak Bebas Cacat

Automatic Cost Estimation Analysis on Datawarehouse Project with Modified Analogy Based Method

Contact Info

Product

Resources

About