2024
DOI: 10.3390/buildings14040933
|View full text |Cite
|
Sign up to set email alerts
|

Strategies for Imputing Missing Values and Removing Outliers in the Dataset for Machine Learning-Based Construction Cost Prediction

Haneul Lee,
Seokheon Yun

Abstract: Accurately predicting construction costs during the initial planning stages is crucial for the successful completion of construction projects. Recent advancements have introduced various machine learning-based methods to enhance cost estimation precision. However, the accumulation of authentic construction cost data is not straightforward, and existing datasets frequently exhibit a notable presence of missing values, posing challenges to precise cost predictions. This study aims to analyze diverse substitution… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 9 publications
0
2
0
Order By: Relevance
“…Table 1 outlines the steps that are applied to the raw dataset, resulting in a reduction of feature columns from 79 to 58. Handling poor-quality data is essential in ML modelling; the Expectation–Maximization (EM) algorithm [ 49 , 50 ] is one of the widely used iterative methods for finding maximum likelihood or maximum posteriori estimates of parameters in statistical models. However, in the collected study dataset, the feature columns containing incomplete data are found to be irrelevant to the intended diagnosis, and thus were identified and safely filtered with the aid of expert ophthalmologists.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Table 1 outlines the steps that are applied to the raw dataset, resulting in a reduction of feature columns from 79 to 58. Handling poor-quality data is essential in ML modelling; the Expectation–Maximization (EM) algorithm [ 49 , 50 ] is one of the widely used iterative methods for finding maximum likelihood or maximum posteriori estimates of parameters in statistical models. However, in the collected study dataset, the feature columns containing incomplete data are found to be irrelevant to the intended diagnosis, and thus were identified and safely filtered with the aid of expert ophthalmologists.…”
Section: Methodsmentioning
confidence: 99%
“…Identifying outliers often requires statistical methods or domain expertise [ 50 ]. Common approaches include standard deviation, median absolute deviation, z-score, boxplot and ML techniques like clustering and anomaly detection algorithms.…”
Section: Methodsmentioning
confidence: 99%