Mobile data traffic is growing exponentially and it is even more challenging to distribute content efficiently while users are "on the move" such as in public transport. The use of mobile devices for accessing content (e.g. videos) while commuting are both expensive and unreliable, although it is becoming common practice worldwide. Leveraging on the spatial and temporal correlation of content popularity and users' diverse network connectivity, we propose a novel content distribution system, uStash, which guarantees better QoE with regards to access delays and cost of usage. The proposed collaborative download and content stashing schemes provide the uStash provider the flexibility to control the cost of content access via cellular networks. We model the uStash system in a probabilistic framework and thereby analytically derive the optimal portions for collaborative downloading. Then, we validate the proposed models using real-life trace driven simulations. In particular, we use dataset from 22 inter-city buses running on 6 different routes and from a mobile VoD service provider to show that uStash reduces the cost of monthly cellular data by approximately 50% and the expected delay for content access by 60% compared to content downloaded via users' cellular network connections.
We propose a generic mechanism to efficiently release differentially private synthetic versions of high-dimensional datasets with high utility. The core technique in our mechanism is the use of copulas, which are functions representing dependencies among random variables with a multivariate distribution. Specifically, we use the Gaussian copula to define dependencies of attributes in the input dataset, whose rows are modelled as samples from an unknown multivariate distribution, and then sample synthetic records through this copula. Despite the inherently numerical nature of Gaussian correlations we construct a method that is applicable to both numerical and categorical attributes alike. Our mechanism is efficient in that it only takes time proportional to the square of the number of attributes in the dataset. We propose a differentially private way of constructing the Gaussian copula without compromising computational efficiency. Through experiments on three real-world datasets, we show that we can obtain highly accurate answers to the set of all one-way marginal, and two-and three-way positive conjunction queries, with 99% of the query answers having absolute (fractional) error rates between 0.01 to 3%. Furthermore, for a majority of two-way and three-way queries, we outperform independent noise addition through the well-known Laplace mechanism. In terms of computational time we demonstrate that our mechanism can output synthetic datasets in around 6 minutes 47 seconds on average with an input dataset of about 200 binary attributes and more than 32,000 rows, and about 2 hours 30 mins to execute a much larger dataset of about 700 binary attributes and more than 5 million rows. To further demonstrate scalability, we ran the mechanism on larger (artificial) datasets with 1,000 and 2,000 binary attributes (and 5 million rows) obtaining synthetic outputs in approximately 6 and 19 hours, respectively. These are highly feasible times for synthetic datasets, which are one-off releases.
We show that the ‘optimal’ use of the parallel composition theorem corresponds to finding the size of the largest subset of queries that ‘overlap’ on the data domain, a quantity we call the maximum overlap of the queries. It has previously been shown that a certain instance of this problem, formulated in terms of determining the sensitivity of the queries, is NP-hard, but also that it is possible to use graph-theoretic algorithms, such as finding the maximum clique, to approximate query sensitivity. In this paper, we consider a significant generalization of the aforementioned instance which encompasses both a wider range of differentially private mechanisms and a broader class of queries. We show that for a particular class of predicate queries, determining if they are disjoint can be done in time polynomial in the number of attributes. For this class, we show that the maximum overlap problem remains NP-hard as a function of the number of queries. However, we show that efficient approximate solutions exist by relating maximum overlap to the clique and chromatic numbers of a certain graph determined by the queries. The link to chromatic number allows us to use more efficient approximate algorithms, which cannot be done for the clique number as it may underestimate the privacy budget. Our approach is defined in the general setting of f-differential privacy, which subsumes standard pure differential privacy and Gaussian differential privacy. We prove the parallel composition theorem for f-differential privacy. We evaluate our approach on synthetic and real-world data sets of queries. We show that the approach can scale to large domain sizes (up to 1020000), and that its application can reduce the noise added to query answers by up to 60%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.