Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures 2015
DOI: 10.1145/2755573.2755575
|View full text |Cite
|
Sign up to set email alerts
|

Communication-Efficient Computation on Distributed Noisy Datasets

Abstract: This paper gives a first attempt to answer the following general question: Given a set of machines connected by a point-to-point communication network, each having a noisy dataset, how can we perform communication-efficient statistical estimations on the union of these datasets? Here 'noisy' means that a real-world entity may appear in different forms in different datasets, but those variants should be considered as the same universe element when performing statistical estimations. We give a first set of commu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
5
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
5
2

Relationship

3
4

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 49 publications
0
5
0
Order By: Relevance
“…A number of statistical problems (F 0 , ℓ 0sampling, heavy hitters, etc.) were studied in the distributed model under the same noisy data model [36]. Unfortunately the multiround algorithms designed in the distributed model cannot be used in the data stream model because on data streams we can only scan the whole dataset once without looking back.…”
Section: Arxiv:181012388v1 [Csds] 29 Oct 2018mentioning
confidence: 99%
“…A number of statistical problems (F 0 , ℓ 0sampling, heavy hitters, etc.) were studied in the distributed model under the same noisy data model [36]. Unfortunately the multiround algorithms designed in the distributed model cannot be used in the data stream model because on data streams we can only scan the whole dataset once without looking back.…”
Section: Arxiv:181012388v1 [Csds] 29 Oct 2018mentioning
confidence: 99%
“…While we loosely motivated our search for approximate solutions by noise in the introduction, in other problems noise is a major concern and explicitly addressed. For example, consider streaming algorithms for estimating statistical parameters like frequency moments [13]. In such problems, certain elements from the universe may appear in different forms due to noise and thus, should actually be treated as the same element.…”
Section: Related Workmentioning
confidence: 99%
“…As far as we have concerned, the distinct element problem has not been studied in the noisy streaming data setting. Very recently, statistical estimations for noisy wellshaped datasets have been studied in the distributed setting for several basic problems, including distinct elements, 0sampling, frequency moments, heavy hitters and empirical entropy [30], in the general metric space, but the algorithms in [30] cannot be applied to the streaming setting since all of them need a "second look" at the dataset. On the other hand, our streaming algorithms can be trivially translated to algorithms for distributed data: k parties process their local datasets using the streaming algorithm in turn following a fixed order, and then send their memory configurations to their successors; the last party outputs the answer.…”
mentioning
confidence: 99%
“…On the other hand, our streaming algorithms can be trivially translated to algorithms for distributed data: k parties process their local datasets using the streaming algorithm in turn following a fixed order, and then send their memory configurations to their successors; the last party outputs the answer. In particular, by such a translation we can obtain a distributed robust F0 algorithm with communication cost of O(k/ 2 ) words for datasets in the Euclidean space, improving the generic algorithm in [30] by a factor of 1/ ·poly log m (m = |S| is the length of the stream).…”
mentioning
confidence: 99%
See 1 more Smart Citation