Sharing trajectories is beneficial for many real-world applications, such as managing disease spread through contact tracing and tailoring public services to a population's travel patterns. However, public concern over privacy and data protection has limited the extent to which this data is shared. Local differential privacy enables data sharing in which users share a perturbed version of their data, but existing mechanisms fail to incorporate user-independent public knowledge (e.g., business locations and opening times, public transport schedules, geo-located tweets). This limitation makes mechanisms too restrictive, gives unrealistic outputs, and ultimately leads to low practical utility. To address these concerns, we propose a local differentially private mechanism that is based on perturbing hierarchically-structured, overlapping n -grams (i.e., contiguous subsequences of length n ) of trajectory data. Our mechanism uses a multi-dimensional hierarchy over publicly available external knowledge of real-world places of interest to improve the realism and utility of the perturbed, shared trajectories. Importantly, including real-world public data does not negatively affect privacy or efficiency. Our experiments, using real-world data and a range of queries, each with real-world application analogues, demonstrate the superiority of our approach over a range of alternative methods.
Sharing sensitive data is vital in enabling many modern data analysis and machine learning tasks. However, current methods for data release are insufficiently accurate or granular to provide meaningful utility, and they carry a high risk of deanonymization or membership inference attacks. In this paper, we propose a differentially private synthetic data generation solution with a focus on the compelling domain of location data. We present two methods with high practical utility for generating synthetic location data from real locations, both of which protect the existence and true location of each individual in the original dataset. Our first, partitioningbased approach introduces a novel method for privately generating point data using kernel density estimation, in addition to employing private adaptations of classic statistical techniques, such as clustering, for private partitioning. Our second, network-based approach incorporates public geographic information, such as the road network of a city, to constrain the bounds of synthetic data points and hence improve the accuracy of the synthetic data. Both methods satisfy the requirements of differential privacy, while also enabling accurate generation of synthetic data that aims to preserve the distribution of the real locations. We conduct experiments using three large-scale location datasets to show that the proposed solutions generate synthetic location data with high utility and strong similarity to the real datasets. We highlight some practical applications for our work by applying our synthetic data to a range of location analytics queries, and we demonstrate that our synthetic data produces near-identical answers to the same queries compared to when real data is used. Our results show that the proposed approaches are practical solutions for sharing and analyzing sensitive location data privately. CCS CONCEPTS• Information systems → Spatial-temporal systems; Data analytics; • Security and privacy → Privacy protections.
How to cite:Please refer to published version for the most recent bibliographic citation information. If a published version is known of, the repository item page linked to above, will contain details on accessing it.
Synthetic data generation is a fundamental task for many data management and data science applications. Spatial data is of particular interest, and its sensitive nature often leads to privacy concerns. We introduce GeoPointGAN, a novel GAN-based solution for generating synthetic spatial point datasets with high utility and strong individual level privacy guarantees. GeoPointGAN's architecture includes a novel point transformation generator that learns to project randomly generated point co-ordinates into meaningful synthetic co-ordinates that capture both microscopic (e.g., junctions, squares) and macroscopic (e.g., parks, lakes) geographic features. We provide our privacy guarantees through label local differential privacy, which is more practical than traditional local differential privacy. We seamlessly integrate this level of privacy into GeoPointGAN by augmenting the discriminator to the point level and implementing a randomized response-based mechanism that flips the labels associated with the 'real' and 'fake' points used in training. Extensive experiments show that GeoPointGAN significantly outperforms recent solutions, improving by up to 10 times compared to the most competitive baseline. We also evaluate GeoPointGAN using range, hotspot, and facility location queries, which confirm the practical effectiveness of GeoPointGAN for privacy-preserving querying. The results illustrate that a strong level of privacy is achieved with little-to-no adverse utility cost, which we explain through the generalization and regularization effects that are realized by flipping the labels of the data during training.
No abstract
Shortest path queries over graphs are usually considered as isolated tasks, where the goal is to return the shortest path for each individual query. In practice, however, such queries are typically part of a system (e.g., a road network) and their execution dynamically affects other queries and network parameters, such as the loads on edges, which in turn affects the shortest paths. We study the problem of collectively processing shortest path queries, where the objective is to optimize a collective objective, such as minimizing the overall cost. We define a temporal load-aware network that dynamically tracks expected loads while satisfying the desirable 'first in, first out' property. We develop temporal load-aware extensions of widely used shortest path algorithms, and a scalable collective routing solution that seeks to reduce system-wide congestion through dynamic path reassignment. Experiments illustrate that our collective approach to this NP-hard problem achieves improvements in a variety of performance measures, such as, i) reducing average travel times by up to 63%, ii) producing fairer suggestions across queries, and iii) distributing load across up to 97% of a city's road network capacity. The proposed approach is generalizable, which allows it to be adapted for other concurrent query processing tasks over networks.
When a person chooses a healthcare provider, they are trading off cost, convenience, and a latent third factor: “perceived quality”. In urban areas of lower- and middle-income countries (LMICs), including slums, individuals have a wide range of choice in healthcare provider, and we hypothesised that people do not choose the nearest and cheapest provider. This would mean that people are willing to incur additional cost to visit a provider they would perceive to be offering better healthcare. In this article, we aim to develop a method towards quantifying this notion of “perceived quality” by using a generalised access cost calculation to combine monetary and time costs relating to a visit, and then using this calculated access cost to observe facilities that have been bypassed. The data to support this analysis comes from detailed survey data in four slums, where residents were questioned on their interactions with healthcare services, and providers were surveyed by our team. We find that people tend to bypass more informal local services to access more formal providers, especially public hospitals. This implies that public hospitals, which tend to incur higher access costs, have the highest perceived quality (i.e., people are more willing to trade cost and convenience to visit these services). Our findings therefore provide evidence that can support the ‘crowding out’ hypothesis first suggested in a 2016 Lancet Series on healthcare provision in LMICs.
Understanding the cost of accessing services in a transit network, and how this varies spatially and temporally is vital for transport agencies to make effective decisions. However, to understand this at the city-scale typically demands the computation of a very large number of shortest path queries, which is computationally infeasible in a practical setting. In this work we define the notion of an access query, an analytical query which returns the aggregate access costs to a set of points of interest within a given time interval. To solve the computational bottleneck, we develop a solution that uses semi-supervised machine learning to efficiently compute these aggregate access costs using a gravity-model. The solution dynamically generates a descriptive representation of the connectivity between origins and destinations in a multi-modal network, and dynamically labels a small subset of the overall trips which are used to form a target vector for the learning algorithm. We also consider the fair distribution of access across spatio-temporal dimensions. The solution can reduce processing times by up to 97%, while maintaining high levels of accuracy; the predicted journey times to services are accurate to within 3.3 minutes, and a high level of correlation (85%) to the ground truth is achieved.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.