Understanding commuters’ behavior and influencing factors becomes more and more important every day. With the steady increase of the number of commuters, commuter traffic becomes a major bottleneck for many cities. Commuter behavior consequently plays an increasingly important role in city and transport planning and policy making. Although prior studies investigated a variety of potential factors influencing commuting decisions, most of them are constrained by the data scale in terms of limited time duration, space and number of commuters under investigation, largely owing to their dependence on questionnaires or survey panel data; as such only small sets of features can be explored and no predictions of commuter numbers have been made, to the best of our knowledge. To fill this gap, we collected inter-city commuting data in Germany between 1994 and 2018, and, along with other data sources, analyzed the influence of GDP, housing and the labor market on the decision to commute. Our analysis suggests that the access to employment opportunities, housing price, income and the distribution of the location’s industry sectors are important factors in commuting decisions. In addition, different age, gender and income groups have different commuting patterns. We employed several machine learning algorithms to predict the commuter number using the identified related features with reasonably good accuracy.
The class imbalance problem is prevalent in many domains including medical, natural language processing, image recognition, economic and geographic areas etc. We perform a systematic experimental comparison of different imbalance classification algorithms -ranging from sampling, distance metric learning, costsensitive learning to ensemble learning approaches -on several datasets from UCI, KEEL and OpenML. The algorithms included DDAE, MWMOTE, SMOTE, RUSBoost, AdaBoost, cost-sensitive decision tree (csDCT), self-paced Ensemble Classifier, MetaCost, CAdaMEC and Iterative Metric Learning (IML). As the substantial bias potentially caused by imbalance classification can be harmful for underrepresented classes which are of critical social and economic values and even lives, the main objective of our study is thus to understand the impact of imbalance ratio and the size of the utilized datasets on the performance of the above-mentioned algorithms. Our experiments show that 1) Sampling methods perform the worst and cannot be used directly for imbalanced classification, since they lack of consideration of neighborhoods based on distance. However, some classifiers can be improved after the balance of class distribution. 2) Cost-sensitive learning models should be utilized when the dataset is less imbalanced, because it is difficult to set an appropriate cost matrix for a specific dataset, which can cause performance fluctuations. 3) IML consistently shows good performance (in terms of F1 and AUCPRC), is resilient to different imbalance ratios but sensitive to the data distribution of the dataset. 4) Ensemble learning techniques generally perform better over other approaches due to their combined intelligence of multiple basic classifiers. 5) In terms of system performance, self-paced Ensemble Classifier performs fairly well with regards to learning time, while IML and DDAE yield the longest learning time; AdaBoost and self-paced Ensemble Classifier are two algorithms require lowest memory usage. We also provide our empirical recommendation for algorithm selection under different requirements and usage scenarios based on our analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.