Train dispatching (TD) is at the forefront of all rail operations that transport passengers or goods. Recent technological advances and the explosion of digital data have introduced data-driven methods (DDMs) in rail operations. In this study, DDMs on the TD problem are briefly explored, focusing on relevant studies on delay distribution, delay propagation, and timetable rescheduling. Data-driven TD methods, including statistical methods (SM), graphical models (GM), and machine learning (ML) methods are reviewed. Then, key issues in establishing different data-driven models for the TD problem are addressed. Subsequently, ML methods are considered to be among the most promising DDMs that lead to innovative TD methods, relying on rich data obtained from train operations. This study emphasizes the potentials for designing new alternatives in the three key fields of interest and provides directions for further research on TD. Future research, including the ML-driven TD and intelligent TD, were discussed in this study. INDEX TERMS Data-driven, delay distribution, delay propagation, timetable rescheduling, train dispatching, machine learning.
Accurate prediction of recoverable train delay can support the train dispatchers’ decision-making with timetable rescheduling and improving service reliability. In this paper, we present the results of an effort aimed to develop primary delay recovery (PDR) predictor model using train operation records from Wuhan-Guangzhou (W-G) high-speed railway. To this end, we first identified the main variables that contribute to delay, including dwell buffer time, running buffer time, magnitude of primary delay time, and individual sections’ influence. Different models are applied and calibrated to predict the PDR. The validation results on test datasets indicate that the random forest regression (RFR) model outperforms the other three alternative models, namely, multiple linear regression (MLR), support vector machine (SVM), and artificial neural networks (ANN) regarding prediction accuracy measure. Specifically, the evaluation results show that when the prediction tolerance is less than 1 min, the RFR model can achieve up to 80.4% of prediction accuracy, while the accuracy level is 44.4%, 78.5%, and 78.5% for MLR, SVM, and ANN models, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.