Individual passenger travel patterns have significant value in understanding passenger’s behavior, such as learning the hidden clusters of locations, time, and passengers. The learned clusters further enable commercially beneficial actions such as customized services, promotions, data-driven urban-use planning, peak hour discovery, and so on. However, the individualized passenger modeling is very challenging for the following reasons: 1) The individual passenger travel data are multi-dimensional spatiotemporal big data, including at least the origin, destination, and time dimensions; 2) Moreover, individualized passenger travel patterns usually depend on the external environment, such as the distances and functions of locations, which are ignored in most current works. This work proposes a multi-clustering model to learn the latent clusters along the multiple dimensions of Origin, Destination, Time, and eventually, Passenger (ODT-P). We develop a graph-regularized tensor Latent Dirichlet Allocation (LDA) model by first extending the traditional LDA model into a tensor version and then applies to individual travel data. Then, the external information of stations is formulated as semantic graphs and incorporated as the Laplacian regularizations; Furthermore, to improve the model scalability when dealing with massive data, an online stochastic learning method based on tensorized variational Expectation-Maximization algorithm is developed. Finally, a case study based on passengers in the Hong Kong metro system is conducted and demonstrates that a better clustering performance is achieved compared to state-of-the-arts with the improvement in point-wise mutual information index and algorithm convergence speed by a factor of two.
Time-varying dynamic Bayesian network (TVDBN) is essential for describing time-evolving directed conditional dependence structures in complex multivariate systems. In this paper, we construct a TVDBN model, together with a score-based method for its structure learning. The model adopts a vector autoregressive (VAR) model to describe inter-slice and intra-slice relations between variables. By allowing VAR parameters to change segment-wisely over time, the time-varying dynamics of the network structure can be described. Furthermore, considering some external information can provide additional similarity information of variables. Graph Laplacian is further imposed to regularize similar nodes to have similar network structures. The regularized maximum a posterior estimation in the Bayesian inference framework is used as a score function for TVDBN structure evaluation, and the alternating direction method of multipliers (ADMM) with L-BFGS-B algorithm is used for optimal structure learning. Thorough simulation studies and a real case study are carried out to verify our proposed method’s efficacy and efficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.