Urban traffic flow forecasting is essential to proactive traffic control and management. Most existing forecasting methods depend on proper and reliable input features, for example, weather conditions and spatiotemporal lagged variables of traffic flow. However, the feature selection process is often done manually without comprehensive evaluation and leads to inaccurate results. For that challenge, this paper presents an approach combining the bias-corrected random forests algorithm with a data-driven feature selection strategy for short-term urban traffic flow forecasting. First, several input features were extracted from traffic flow time series data. Then the importance of these features was quantified with the permutation importance measure. Next, a data-driven feature selection strategy was introduced to identify the most important features. Finally, the forecasting model was built on the bias-corrected random forests algorithm and the selected features. The proposed approach was validated with data collected from three types of urban roads (expressway, major arterial, and minor arterial) in Kunshan City, China. The proposed approach was also compared with 10 existing approaches to verify its effectiveness. The results of the validation and comparison show that even without further model tuning, the proposed approach achieves the lowest average mean absolute error and root mean square error on six stations while it achieves the second-best average performance in mean absolute percentage error. Meanwhile, the training efficiency is improved compared with the original random forests method owing to the use of the feature selection strategy.
A common way to estimate dynamic origin-destination (O-D) flows is to establish and solve a bilevel optimization model. Though numerous efforts have been devoted to effectively and efficiently solving the model, challenges still exist because of the interdependence of jointly solving the upper level O-D estimation and lower level traffic assignment problems and the nonconvexity of the model. This paper presents an alternative framework for estimating dynamic O-D flows using machine learning algorithms. The framework consists of three major modules: a learner that learns the dynamic mapping patterns describing the relationship between prior O-D flows and observed link flows, an assigner that assigns a given O-D matrix to different links based on the learner, and a searcher that iteratively searches the optimal O-D solution using the assigner. A convolutional neural network is designed as the learner and trained as the assigner. Next, the algorithms to estimate a regular O-D matrix and real-time O-D flows are separately developed by using the assigner and two designed genetic algorithms built as the searcher. The framework was evaluated with a realistic network in the downtown area of Kunshan, China. The experimental studies show that the framework can achieve satisfactory estimation performances in real time. Meanwhile, it takes raw flow ranges as the prior inputs, making it robust in the case of lacking an accurate target O-D matrix. INDEX TERMS Dynamic O-D estimation, bilevel optimization, convolutional neural network, genetic algorithm.
Short-term traffic flow forecasting is crucial for proactive traffic management and control. One key issue associated with the task is how to properly define and capture the temporal patterns of traffic flow. A feasible solution is to design a multi-regime strategy. In this paper, an effective approach to forecasting short-term traffic flow based on multi-regime modeling and ensemble learning is presented. First, to properly capture the different patterns of traffic flow dynamics, a regime identification model based on probabilistic modeling was developed. Each identified regime represents a specific traffic phase, and was used as the representative feature for the forecasting modeling. Second, a forecasting model built on an ensemble learning strategy was developed, which integrates the forecasts of multiple regression trees. The traffic flow data over 5-min intervals collected from four I-80 freeway segments, in California, USA, was used to evaluate the proposed approach. The experimental results show that the identified regimes are able to well explain the different traffic phases, and play an important role in forecasting. Furthermore, the developed forecasting model outperformed four typical models in terms of root mean square error (RMSE) and mean absolute percentage error (MAPE) on three traffic flow measures.
Many analytical procedures, technical methods, and tools have been developed to facilitate manual inspection of traffic congestion and support the decision-making process for traffic authorities. However, lacking an automatic mechanism, it would be a time-consuming and labour-intensive process for day-today and location-by-location analyses. This study presents a method based on a three-stage framework that is capable of automatically identifying and characterising spatiotemporal congested areas (STCAs) by parsing, extracting, analysing and quantifying the knowledge contained in traffic heatmaps. The key components of the proposed method are two unsupervised clustering procedures: (i) a mini-batch k-means clustering algorithm to separate the congested and non-congested areas and (ii) a graph-theory-based clustering algorithm to distinguish between different STCAs. Twenty weekdays of dual loop detector data collected from a 26-mile stretch of Interstate 10 in Phoenix, Arizona was analysed for the case study. The new method identified and quantified 102 STCAs without the need for human intervention. Based on 14 traffic measures calculated for each STCA, 19 active bottlenecks along the study corridor were identified. Top-ranked bottlenecks identified in this study were consistent with those reported in previous studies but were produced with less effort, demonstrating the new method's potential utility for traffic congestion management systems.
Determining spatiotemporal impact areas of incidents plays a significant role in incident impact analysis. Although existing empirical methods have proven to be promising, they suffer from the drawbacks that limit their wide applications in automated freeway safety management. This study presents a data‐driven approach to automatically determining the spatiotemporal impact areas of freeway incidents. The spatiotemporal contour plots were first constructed using three representative traffic measures. Next, a nonrecurrent congestion area identification method based on fuzzy clustering was developed. To distinguish possible multiple independent blocks in the nonrecurrent congestion area, a clustering algorithm based on graph theory was adopted. The incident impact areas were then determined by conducting a postprocessing strategy. The incident records and the associated traffic flow data, collected on I‐5 freeway segments in San Diego Region, CA, were used to evaluate the proposed approach. Experimental results show the proposed approach can automatically and properly determine incident impact areas while accounting for the uncertainty resulting from traffic variations.
Ensemble of classifiers constitutes one of the main current directions in machine learning and data mining. It is accepted that the ensemble methods can be divided into static and dynamic ones. Dynamic ensemble methods explore the use of different classifiers for different samples and therefore may get better generalization ability than static ensemble methods. However, for most of dynamic approaches based on KNN rule, additional part of training samples should be taken out for estimating ''local classification performance'' of each base classifier. When the number of training samples is not sufficient enough, it would lead to the lower accuracy of the training model and the unreliableness for estimating local performances of base classifiers, so further hurt the integrated performance. This paper presents a new dynamic ensemble model that introduces cross-validation technique in the process of local performances' evaluation and then dynamically assigns a weight to each component classifier. Experimental results with 10 UCI data sets demonstrate that when the size of training set is not large enough, the proposed method can achieve better performances compared with some dynamic ensemble methods as well as some classical static ensemble approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.