2015
DOI: 10.1016/j.infsof.2015.07.004
|View full text |Cite
|
Sign up to set email alerts
|

An empirical analysis of data preprocessing for machine learning-based software cost estimation

Abstract: Context: Due to the complex nature of the software development process, traditional parametric models and statistical methods often appear to be inadequate to model the increasingly complicated relationship between project development cost and the project features (or cost drivers). Machine learning (ML) methods, with several reported successful applications, have gained popularity for software cost estimation in recent years. Data preprocessing has been claimed by many researchers as a fundamental stage of ML… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
112
0
1

Year Published

2017
2017
2022
2022

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 162 publications
(124 citation statements)
references
References 98 publications
0
112
0
1
Order By: Relevance
“…Less attention has been focused on MDT methods themselves. In a more recent study, Huang et al (2015) found that only some of the former software effort estimation studies have considered the significance of the MDTs, of which only Minku and Yao (2011) (Myrtveit et al, 2001;Strike et al, 2001), and the prediction error may be introduced (Mittas and Angelis, 2010). MEI is efficient and has been involved in SEE as the most popular imputation approach; however, it will cause bias to data.…”
Section: Knn Imputation Improvementmentioning
confidence: 99%
See 2 more Smart Citations
“…Less attention has been focused on MDT methods themselves. In a more recent study, Huang et al (2015) found that only some of the former software effort estimation studies have considered the significance of the MDTs, of which only Minku and Yao (2011) (Myrtveit et al, 2001;Strike et al, 2001), and the prediction error may be introduced (Mittas and Angelis, 2010). MEI is efficient and has been involved in SEE as the most popular imputation approach; however, it will cause bias to data.…”
Section: Knn Imputation Improvementmentioning
confidence: 99%
“…For example, a well-known technique called listwise deletion, had been widely adopted for handling missing values during data-preprocessing, but it potentially impairs the completeness of data and introduces undesirable biases in estimation (Huang et al, 2015). By contrast, missing data imputation methods replace missing variables by artificial estimates (Song et al, 2008); at the same time maintain the data completeness.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…They concluded that regression trees or analogy-based methods are the best performers and offered means to address the conclusion instability issue. In (Huang et al 2015) several data preprocessing techniques were empirically assessed on the effectiveness of machine learning methods for effort estimation. The results indicate that data preprocessing techniques may significantly influence the predictions, but sometimes it might have negative impacts on prediction performance.…”
Section: Framework For Benchmarking Prediction Modelsmentioning
confidence: 99%
“…Software project managers need to be able to estimate the effort and cost of development early in the life cycle, as it affects the success of software project management (Huang et al 2015).…”
mentioning
confidence: 99%