Spatial regression models are widely used in numerous areas, including detecting and predicting traffic volume, air pollution, and housing prices. Unlike conventional regression models, which commonly assume independent and identical distributions among observations, existing spatial regression requires the prior knowledge of spatial dependency among the observations in different spatial locations. Such a spatial dependency is typically predefined by domain experts or heuristics. However, without sufficient consideration on the context of the specific prediction task, it is prohibitively difficult for one to pre-define the numerical values of the spatial dependency without bias. More importantly, in many situations, the existing techniques are insufficient to sense the complete connectivity and topological patterns among spatial locations (e.g., in underground water networks and human brain networks). Until now, these issues have been extremely difficult to address and little attention has been paid to the automatic optimization of spatial dependency in relation to a prediction task, due to three challenges: (1) necessity and complexity of modeling the spatial topological constraints; (2) incomplete prior spatial knowledge; and (3) difficulty in optimizing under spatial topological constraints that are usually discrete or nonconvex. To address these challenges, this article proposes a novel convex framework that automatically jointly learns the prediction mapping and spatial dependency based on spatial topological constraints. There are two different scenarios to be addressed. First, when the prior knowledge on existence of conditional independence among spatial locations is known (e.g., via spatial contiguity), we propose the first model named Spatial-Autoregressive Dependency Learning I (SADL-I) to further quantify such spatial dependency. However, when the knowledge on the conditional independence is unknown or incomplete, our second model named Spatial-Autoregressive Dependency Learning II (SADL-II) is proposed to automatically learn the conditional independence pattern as well as quantify the numerical values of the spatial dependency based on spatial topological constraints. Topological constraints are usually discrete and nonconvex, which makes them extremely difficult to be optimized together with continuous optimization problems of spatial regression. To address this, we propose a convex and continuous equivalence of the original discrete topological constraints with a theoretical guarantee. The proposed models are then transferred to convex problems that can be iteratively optimized by our new efficient algorithms until convergence to a global optimal solution. Extensive experimentation using several real-world datasets demonstrates the outstanding performance of the proposed models. The code of our SADL framework is available at: http://mason.gmu.edu/∼lzhao9/materials/codes/SADL.
Abstract:The problem of traffic prediction is paramount in a plethora of applications, ranging from individual trip planning to urban planning. Existing work mainly focuses on traffic prediction on road networks. Yet, public transportation contributes a significant portion to overall human mobility and passenger volume. For example, the Washington, DC metro has on average 600,000 passengers on a weekday. In this work, we address the problem of modeling, classifying and predicting such passenger volume in public transportation systems. We study the case of the Washington, DC metro exploring fare card data, and specifically passenger in-and outflow at stations. To reduce dimensionality of the data, we apply principal component analysis to extract latent features for different stations and for different calendar days. Our unsupervised clustering results demonstrate that these latent features are highly discriminative. They allow us to derive different station types (residential, commercial, and mixed) and to effectively classify and identify the passenger flow of "unknown" stations. Finally, we also show that this classification can be applied to predict the passenger volume at stations. By learning latent features of stations for some time, we are able to predict the flow for the following hours. Extensive experimentation using a baseline neural network and two naïve periodicity approaches shows the considerable accuracy improvement when using the latent feature based approach.
Collections of real-world data usually have implicit or explicit structural relations. For example, databases link records through foreign keys, and XML documents express associations between different values through syntax. Privacy preservation, until now, has focused either on data with a very simple structure, e.g. relational tables, or on data with very complex structure e.g. social network graphs, but has ignored intermediate cases, which are the most frequent in practice. In this work, we focus on tree structured data. The paper defines k (m,n) -anonymity, which provides protection against identity disclosure and proposes a greedy anonymization heuristic that is able to sanitize large datasets. The algorithm and the quality of the anonymization are evaluated experimentally.
Machine learning techniques including neural networks are popular tools for chemical, physical and materials applications searching for viable alternative methods in the analysis of structure and energetics of systems ranging...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.