Analysis of Factors Affecting Hit-and-Run and Non-Hit-and-Run in Vehicle-Bicycle Crashes: A Non-Parametric Approach Incorporating Data Imbalance Treatment

Zhou, Baojian; Li, Zongzhi; Zhang, Shengrui; Zhang, Xinfen; Liu, Xin; Ma, Qiannan

doi:10.3390/su11051327

Cited by 17 publications

(9 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Then, for the instances far away from the borderline, an extrapolation technique is used to generate minority class instances. On the other hand, for the instances closer to the borderline an interpolation technique similar to SMOTE is used to generate the minority instances [26].…”

Section: ) Support Vector Machine With Smote (Svm-smote)mentioning

confidence: 99%

“…To handle the data inconsistent distribution problem, eight advanced balancing techniques from the literature have been applied in the preprocessing stage, namely, SMOTE (Synthetic Minority Oversampling TEchnique) [22], BL-SMOTE (Borderline SMOTE) [23], SMOTE-ENN [24], K-means SMOTE [25], SMOTE-NC (SMOTE Nominal-Continuous) [22], SMOTE-Tomek (SMOTE with Tomek links) [24], SVM-SMOTE (Support Vector Machine with SMOTE) [26] and ADASYN (ADaptive SYNthetic sampling approach) [27]. These resampling techniques significantly enhance the behavior of the classifiers, i.e., they dramatically decrease the classifiers' minority class misclassification.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Comparing the Performance of Deep Learning Methods to Predict Companies’ Financial Failure

et al. 2021

View full text Add to dashboard Cite

One of the most crucial problems in the field of business is financial forecasting. Many companies are interested in forecasting their incoming financial status in order to adapt to the current financial and business environment to avoid bankruptcy. In this work, due to the effectiveness of Deep Learning methods with respect to classification tasks, we compare the performance of three well-known Deep Learning methods (Long-Short Term Memory, Deep Belief Network and Multilayer Perceptron model of 6 layers) with three bagging ensemble classifiers (Random Forest, Support Vector Machine and K-Nearest Neighbor) and two boosting ensemble classifiers (Adaptive Boosting and Extreme Gradient Boosting) in companies' financial failure prediction. Because of the inherent nature of the problem addressed, three extremely imbalanced datasets of Spanish, Taiwanese and Polish companies' data have been considered in this study. Thus, five oversampling balancing techniques, two hybrid balancing techniques (undersampling-oversampling) and one clustering-based balancing technique have been applied to avoid this data inconsistency problem. Considering the real financial data complexity level and type, the obtained results show that Multilayer Perceptron model of 6 layers, in conjunction with SMOTE-ENN balancing method, yielded the best performance according to the Accuracy, Recall and Type II error metrics. In addition, Long-Short Term Memory, Multilayer Perceptron and ensemble methods obtained also very good results, outperforming several classifiers used in previous studies with the same datasets.INDEX TERMS Economic forecasting, classification algorithms, machine learning, deep learning, data balancing.

show abstract

Section: ) Support Vector Machine With Smote (Svm-smote)mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Comparing the Performance of Deep Learning Methods to Predict Companies’ Financial Failure

et al. 2021

View full text Add to dashboard Cite

show abstract

“…To validate Model I and II, k-fold cross-validation was conducted, which is a popular procedure for estimating the performance of a classification algorithm on a data set [46,47]. In this study, k was set up as ten; thus, the validation was tenfold.…”

Section: Validationmentioning

confidence: 99%

“…The error rates for the comparison of prediction performance of the models were calculated using Equation (8). The average error rate in the four cases was used to analyze the accuracy of each fold; then, the total average [47] and the standard deviation of the ten folds were used as model-validation criteria.…”

Section: Validationmentioning

confidence: 99%

Development of a Model for Predicting Probabilistic Life-Cycle Cost for the Early Stage of Public-Office Construction

Jin

Kim²,

Hyun

et al. 2019

Sustainability

View full text Add to dashboard Cite

Decisions made in the early stages of construction projects significantly influence the costs incurred in subsequent stages. Therefore, such decisions must be based on the life-cycle cost (LCC), which includes the maintenance, repair, and replacement (MRR) costs in addition to construction costs. Furthermore, as uncertainty is inherent during the early stages, it must be considered in making predictions of the LCC more probabilistic. This study proposes a probabilistic LCC prediction model developed by applying the Monte Carlo simulation (MCS) to an LCC prediction model based on case-based reasoning (CBR) to support the decision-making process in the early stages of construction projects. The model was developed in two phases: first, two LCC prediction models were constructed using CBR and multiple-regression analysis. Through k-fold validation, one model with superior prediction performance was selected; second, a probabilistic LCC model was developed by applying the MCS to the selected model. The probabilistic LCC prediction model proposed in this study can generate probabilistic prediction results that consider the uncertainty of information available at the early stages of a project. Thus, it can enhance reliability in actual situations and be more useful for clients who support both construction and MRR costs, such as those in the public sector.

show abstract

“…This would induce a bias toward the majority instances. The trained model would classify the majority instances much more accurately while misclassifying the minority instances, making the model fail to be informative [24,25]. When the identification of minority instances is of interest, this misclassification could result in substantial costs.…”

Section: Introductionmentioning

confidence: 99%

Analysis of Factors Affecting Real-Time Ridesharing Vehicle Crash Severity

Zhou

Zhang

et al. 2019

Sustainability

Self Cite

View full text Add to dashboard Cite

The popular real-time ridesharing service has promoted social and environmental sustainability in various ways. Meanwhile, it also brings some traffic safety concerns. This paper aims to analyze factors affecting real-time ridesharing vehicle crash severity based on the classification and regression tree (CART) model. The Chicago police-reported crash data from January to December 2018 is collected. Crash severity in the original dataset is highly imbalanced: only 60 out of 2624 crashes are severe injury crashes. To fix the data imbalance problem, a hybrid data preprocessing approach which combines the over- and under-sampling is applied. Model results indicate that, by resampling the crash data, the successfully predicted severe crashes are increased from 0 to 40. Besides, the G-mean is increased from 0% to 73%, and the AUC (area under the receiver operating characteristics curve) is increased from 0.73 to 0.82. The classification tree reveals that following variables are the primary indicators of real-time ridesharing vehicle crash severity: pedestrian/pedalcyclist involvement, number of passengers, weather condition, trafficway type, vehicle manufacture year, traffic control device, driver gender, lighting condition, vehicle type, driver age and crash time. The current study could provide some valuable insights for the sustainable development of real-time ridesharing services and urban transportation.

show abstract

Analysis of Factors Affecting Hit-and-Run and Non-Hit-and-Run in Vehicle-Bicycle Crashes: A Non-Parametric Approach Incorporating Data Imbalance Treatment

Cited by 17 publications

References 27 publications

Comparing the Performance of Deep Learning Methods to Predict Companies’ Financial Failure

Comparing the Performance of Deep Learning Methods to Predict Companies’ Financial Failure

Development of a Model for Predicting Probabilistic Life-Cycle Cost for the Early Stage of Public-Office Construction

Analysis of Factors Affecting Real-Time Ridesharing Vehicle Crash Severity

Contact Info

Product

Resources

About