Handling Imbalanced Data in Road Crash Severity Prediction by Machine Learning Algorithms

Lym

2021

Along with the rapid demographic change, there has been increased attention to the risk of vehicle crashes relative to older drivers. Due to senior involvement and their physical vulnerability, it is crucial to develop models that accurately predict the severity of senior-involved crashes. However, the challenge is how to cope with an imbalanced severity class distribution and the ordered nature of crash severities, as these can complicate the classification of the severity of crashes. In that regard, this study investigates the influence of implementing ordinal nature and handling imbalanced class distribution on the prediction performance. Using vehicle crash data in Ohio, U.S., as an example, the eight machine learning classifiers (logistic and ordered logistic regressions and random forest and ordered random forest with or without handling imbalanced classes) are suggested and then compared with their respective performances. The analysis outcomes show that balancing strategy enhances performance in predicting severe crashes. In contrast, the effects of implementing ordinal nature vary across models. Specifically, the ordered random forest classifier without balancing appears to be superior in terms of overall prediction accuracy, and the ordered random forest with balancing outperforms others in predicting severer crashes.

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Developing Crash Severity Model Handling Class Imbalance and Implementing Ordered Nature: Focusing on Elderly Drivers

Lym

2021

“…ML methods are more flexible with no or fewer model assumptions for input variables, and also have better fitting characteristics. Some of the commonly used ML approaches used in crash injury severity prediction include artificial neural networks (ANN) [ 58 , 59 , 60 ], random forest [ 54 , 61 , 62 ], support vector machines (SVM) [ 51 , 63 , 64 ], naïve Bayes [ 65 , 66 , 67 ], K-means clustering (KC) [ 68 , 69 , 70 ], and decision trees (DT) [ 71 , 72 , 73 ].…”

Section: Related Workmentioning

confidence: 99%

Exploring the Injury Severity Risk Factors in Fatal Crashes with Neural Network

Jamal

Umer

2020

A better understanding of circumstances contributing to the severity outcome of traffic crashes is an important goal of road safety studies. An in-depth crash injury severity analysis is vital for the proactive implementation of appropriate mitigation strategies. This study proposes an improved feed-forward neural network (FFNN) model for predicting injury severity associated with individual crashes using three years (2017–2019) of crash data collected along 15 rural highways in the Kingdom of Saudi Arabia (KSA). A total of 12,566 crashes were recorded during the study period with a binary injury severity outcome (fatal or non-fatal injury) for the variable to be predicted. FFNN architecture with back-propagation (BP) as a training algorithm, logistic as activation function, and six number of hidden neurons in the hidden layer yielded the best model performance. Results of model prediction for the test data were analyzed using different evaluation metrics such as overall accuracy, sensitivity, and specificity. Prediction results showed the adequacy and robust performance of the proposed method. A detailed sensitivity analysis of the optimized NN was also performed to show the impact and relative influence of different predictor variables on resulting crash injury severity. The sensitivity analysis results indicated that factors such as traffic volume, average travel speeds, weather conditions, on-site damage conditions, road and vehicle type, and involvement of pedestrians are the most sensitive variables. The methods applied in this study could be used in big data analysis of crash data, which can serve as a rapid-useful tool for policymakers to improve highway safety.

“…Although the accuracy was found to be very low, the authors did not apply any dimension reduction technique such as principal component analysis (PCA) on the crash dataset to overcome the problems of correlation between the input variables. Fiorentini and Losa [ 24 ] investigated the effect of applying balancing techniques on crash data on the performance of multiple ML models such as random tree, KNN, LR, and RF. It was found that introducing balancing technique enhanced the prediction power of the developed models.…”

Section: Literature Reviewmentioning

confidence: 99%

Traffic Crash Severity Prediction—A Synergy by Hybrid Principal Component Analysis and Machine Learning Models

Assi

2020

The accurate prediction of road traffic crash (RTC) severity contributes to generating crucial information, which can be used to adopt appropriate measures to reduce the aftermath of crashes. This study aims to develop a hybrid system using principal component analysis (PCA) with multilayer perceptron neural networks (MLP-NN) and support vector machines (SVM) in predicting RTC severity. PCA shows that the first nine components have an eigenvalue greater than one. The cumulative variance percentage explained by these principal components was found to be 67%. The prediction accuracies of the models developed using the original attributes were compared with those of the models developed using principal components. It was found that the testing accuracies of MLP-NN and SVM increased from 64.50% and 62.70% to 82.70% and 80.70%, respectively, after using principal components. The proposed models would be beneficial to trauma centers in predicting crash severity with high accuracy so that they would be able to prepare for appropriate and prompt medical treatment.