Strategies for Imputing Missing Values and Removing Outliers in the Dataset for Machine Learning-Based Construction Cost Prediction

Lee, Haneul; Yun, Seokheon

doi:10.3390/buildings14040933

Cited by 1 publication

(2 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 1 outlines the steps that are applied to the raw dataset, resulting in a reduction of feature columns from 79 to 58. Handling poor-quality data is essential in ML modelling; the Expectation–Maximization (EM) algorithm [ 49 , 50 ] is one of the widely used iterative methods for finding maximum likelihood or maximum posteriori estimates of parameters in statistical models. However, in the collected study dataset, the feature columns containing incomplete data are found to be irrelevant to the intended diagnosis, and thus were identified and safely filtered with the aid of expert ophthalmologists.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Smart decision support system for keratoconus severity staging using corneal curvature and thinnest pachymetry indices

Muhsin,

Qahwaji,

AlShawabkeh

et al. 2024

Eye and Vis

View full text Add to dashboard Cite

Background This study proposes a decision support system created in collaboration with machine learning experts and ophthalmologists for detecting keratoconus (KC) severity. The system employs an ensemble machine model and minimal corneal measurements. Methods A clinical dataset is initially obtained from Pentacam corneal tomography imaging devices, which undergoes pre-processing and addresses imbalanced sampling through the application of an oversampling technique for minority classes. Subsequently, a combination of statistical methods, visual analysis, and expert input is employed to identify Pentacam indices most correlated with severity class labels. These selected features are then utilized to develop and validate three distinct machine learning models. The model exhibiting the most effective classification performance is integrated into a real-world web-based application and deployed on a web application server. This deployment facilitates evaluation of the proposed system, incorporating new data and considering relevant human factors related to the user experience. Results The performance of the developed system is experimentally evaluated, and the results revealed an overall accuracy of 98.62%, precision of 98.70%, recall of 98.62%, F1-score of 98.66%, and F2-score of 98.64%. The application's deployment also demonstrated precise and smooth end-to-end functionality. Conclusion The developed decision support system establishes a robust basis for subsequent assessment by ophthalmologists before potential deployment as a screening tool for keratoconus severity detection in a clinical setting.

show abstract

Section: Methodsmentioning

confidence: 99%

“…Identifying outliers often requires statistical methods or domain expertise [ 50 ]. Common approaches include standard deviation, median absolute deviation, z-score, boxplot and ML techniques like clustering and anomaly detection algorithms.…”

Section: Methodsmentioning

confidence: 99%