Integrating feature engineering, genetic algorithm and tree-based machine learning methods to predict the post-accident disability status of construction workers

Koç, Kerim; Ekmekçioğlu, Ömer; Gürgün, Aslı Pelin

doi:10.1016/j.autcon.2021.103896

Cited by 58 publications

(33 citation statements)

References 90 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…According to the analysis results, the optimum hyperparameters were identified as 350, 3, 0.1, 5, 0.6, and 42, while searching the parameters with a step size of 50, 1, 0.01, 1, 0.1, and 1, respectively. Figure 3 also illustrates the changes in the AUROC values based on the number of trees, which is one of the most determinant parameters of tree-based ML methods [31]. AUROC curve of the proposed SGB model shows that the model achieved an AUROC value of 0.741 (Fig.…”

Section: Resultsmentioning

confidence: 99%

“…The reason to apply RUS algorithm instead of other commonly considered data resampling methods such as random over-sampling (ROS) or synthetic minority over-sampling technique (SMOTE) is its high performance addressed in a comparative study [17]. Besides, artificial cases are not generated in the RUS method unlike ROS or SMOTE algorithms [31]. It is important to state that resampling algorithm was not performed for the testing set to examine the performance of the model, in case of an imbalanced class distribution commonly observed in real-life conditions [32].…”

Section: Data Preprocessingmentioning

confidence: 99%

See 1 more Smart Citation

Determining the Short-term Susceptibility of Construction Workers to Occupational Accidents Using Stochastic Gradient Boosting

Koç¹

2023

JCEMI

View full text Add to dashboard Cite

Employees working on construction projects have a higher risk of experiencing occupational accidents compared to workers in other sectors. Newly employed workers might face an occupational accident shortly after they start working due to not being able to notice risky environmental conditions. Despite existing studies in construction safety literature focusing on several output variables such as fatality, accident type or accident severity predictions, no studies have examined the short-term susceptibility of construction workers to occupational injuries. This study aims to develop a model to predict construction workers' susceptibility to short-term occupational accidents using interpretable machine learning (ML) methods. Hence, the primary research objective is to identify construction workers who have high probability of experiencing an occupational accident shortly after their employment. In this respect, a national dataset of occupational accidents encountered in the construction industry in Turkey was collected and subjected to various pre-processing elements (data cleaning, data scaling, and data resampling) to prepare the data for prediction. At the processing step, Stochastic Gradient Boosting (SGB) algorithm was applied for the classification purpose.In the next step, Shapley Additive Explanations (SHAP) was used as an interpretable artificial intelligence algorithm to explain how, to what extent, and in which direction the input variables affect the prediction scheme, which is another distinguishing feature of the present study compared to past studies in the subject matter. Results show that the proposed SGB model is a powerful detector for the classification problem and salary of workers, past accident in the company, and number of workers in the company were the most influencing factors. Overall, this study contributes to practice by improving the safety conditions of the newly employed workers as well as minimizing their accident probability through intensified safety training. Given that contemporary safety management applications demand a new set of data-driven inputs, proposed model is expected to help industry professionals and safety managers apply more robust safety risk mitigation and/or prevention measures.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Data Preprocessingmentioning

confidence: 99%

Determining the Short-term Susceptibility of Construction Workers to Occupational Accidents Using Stochastic Gradient Boosting

Koç¹

2023

JCEMI

View full text Add to dashboard Cite

show abstract

“…The dataset after step (1) contains 34 categorical features, such as passenger gender, flight cabin, etc. Since the machine learning model has a better performance on numerical features, the categorical feature should be transformed into their numerical counterparts by encoding [15], [16]. Among them, the flight cabin is subjected to label encoding [17] due to the ordinal values.…”

Section: B Data Cleaning and Encodingmentioning

confidence: 99%

“…In the second stage, the dataset is resampled to balance. In the related literature, random undersampling, random oversampling, or synthetic minority oversampling techniques (SMOTE) [15] are often used for resampling. Considering that the former two approaches may lead to the risk of data loss or model overfitting, SMOTE [20] based on the K-nearest neighbor idea is applied to balance the dataset.…”

Section: The Incomplete Data Processing Layermentioning

confidence: 99%

Predicting Airline Additional Services Consumption Willingness Based on High-Dimensional Incomplete Data

2022

View full text Add to dashboard Cite

Prediction of the purchase willingness of passengers has great benefits for airlines to promote auxiliary services, however, the datasets stored in passenger travel information systems are often high-dimensional and incomplete. This study develops a prediction method of airline additional service consumption willingness based on high-dimensional and incomplete datasets with a triple-layer hybrid PSO-XGBoost model, which consists of an incomplete data processing layer, a high-dimensional data processing layer, and a predicting layer. The raw dataset is converted into a complete and low-dimensional dataset through the first two layers and inputted into the predicting layer to train and optimize the XGBoost model together with the PSO algorithm and 10-fold cross-validation. The experimental results show that the proposed method outperforms other traditional machine learning models, presenting the highest prediction score with 0.9879 in terms of AUC. The findings help predict airline additional services consumption intentions of passengers and are beneficial to efficient and low-cost precise marketing for airlines.

show abstract

“…It is possible to classify the techniques as either image processing or machine learning. Filters, morphological analysis, statistical approaches, and percolation techniques are used in the image processing methods for crack detection [7] [8], and no model training process is necessary. However, with machine learning, a dataset of images is gathered and fed into the machine learning model of choice during the training phase.…”

Section: Introductionmentioning

confidence: 99%

Detection of Road Cracks Using Convolutional Neural Networks and Threshold Segmentation

Ashraf

Sophian

Shafie

et al. 2022

JIAE

View full text Add to dashboard Cite

Automatic road crack detection is an important transportation maintenance responsibility for ensuring driving comfort and safety. Manual inspection is considered to be a risky method because it is time consuming, costly, and dangerous for the inspectors. Automated road crack detecting techniques have been extensively researched and developed in order to overcome this issue. Despite the difficulties, most of the proposed methodologies and solutions involve machine vision and machine learning, which have lately acquired traction largely due to the increasingly more affordable processing power. Nonetheless, it remains a difficult task due to the inhomogeneity of crack intensity and the intricacy of the background. In this paper, a convolutional neural network-based method for crack detection is proposed. The method is inspired from recent advancements in applying machine learning to computer vision. The primary goal of this work is to employ convolutional neural networks to detect the road crack. Data in the form of images has been used as input, preprocessing and threshold segmentation is applied to the input data. The processed output is feed to CNN for feature extraction and classification. The training accuracy was found to be 96.20 %, the validation accuracy to be 96.50 %, and the testing accuracy to be 94.5 %.

show abstract

Integrating feature engineering, genetic algorithm and tree-based machine learning methods to predict the post-accident disability status of construction workers

Cited by 58 publications

References 90 publications

Determining the Short-term Susceptibility of Construction Workers to Occupational Accidents Using Stochastic Gradient Boosting

Determining the Short-term Susceptibility of Construction Workers to Occupational Accidents Using Stochastic Gradient Boosting

Predicting Airline Additional Services Consumption Willingness Based on High-Dimensional Incomplete Data

Detection of Road Cracks Using Convolutional Neural Networks and Threshold Segmentation

Contact Info

Product

Resources

About