2020
DOI: 10.3390/infrastructures5070061
|View full text |Cite
|
Sign up to set email alerts
|

Handling Imbalanced Data in Road Crash Severity Prediction by Machine Learning Algorithms

Abstract: Crash severity is undoubtedly a fundamental aspect of a crash event. Although machine learning algorithms for predicting crash severity have recently gained interest by the academic community, there is a significant trend towards neglecting the fact that crash datasets are acutely imbalanced. Overlooking this fact generally leads to weak classifiers for predicting the minority class (crashes with higher severity). In this paper, in order to handle imbalanced accident datasets and provide a better prediction fo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
51
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 93 publications
(53 citation statements)
references
References 52 publications
1
51
0
1
Order By: Relevance
“…It focuses on how to predict minority classes more accurately by controlling the false positive rate increased. One of the solutions is based on the sampling strategy, which is broadly categorized by undersampling and oversampling [ 16 ]. The former (i.e., undersampling) is a sampling approach that reduces the size of a majority class so as to be “balanced” with that of a minority class, whereas the latter (i.e., oversampling) is to duplicate a minority class to increase its size.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…It focuses on how to predict minority classes more accurately by controlling the false positive rate increased. One of the solutions is based on the sampling strategy, which is broadly categorized by undersampling and oversampling [ 16 ]. The former (i.e., undersampling) is a sampling approach that reduces the size of a majority class so as to be “balanced” with that of a minority class, whereas the latter (i.e., oversampling) is to duplicate a minority class to increase its size.…”
Section: Methodsmentioning
confidence: 99%
“…Thus, the classifiers tend to predict majority classes more accurately than minority counterparts [ 14 , 15 ]. As Fiorentini and Losa (2020) [ 16 ] pointed out, most research works predicting crash severity overlooked the imbalanced class problem, leading them to develop and compare crash severity prediction models with or without handling the imbalanced problem. The authors recommended addressing the imbalanced issue when predicting crash severity.…”
Section: Introductionmentioning
confidence: 99%
“…ML methods are more flexible with no or fewer model assumptions for input variables, and also have better fitting characteristics. Some of the commonly used ML approaches used in crash injury severity prediction include artificial neural networks (ANN) [ 58 , 59 , 60 ], random forest [ 54 , 61 , 62 ], support vector machines (SVM) [ 51 , 63 , 64 ], naïve Bayes [ 65 , 66 , 67 ], K-means clustering (KC) [ 68 , 69 , 70 ], and decision trees (DT) [ 71 , 72 , 73 ].…”
Section: Related Workmentioning
confidence: 99%
“…Although the accuracy was found to be very low, the authors did not apply any dimension reduction technique such as principal component analysis (PCA) on the crash dataset to overcome the problems of correlation between the input variables. Fiorentini and Losa [ 24 ] investigated the effect of applying balancing techniques on crash data on the performance of multiple ML models such as random tree, KNN, LR, and RF. It was found that introducing balancing technique enhanced the prediction power of the developed models.…”
Section: Literature Reviewmentioning
confidence: 99%