2015
DOI: 10.1016/j.tcs.2015.01.026
|View full text |Cite
|
Sign up to set email alerts
|

Efficient sampling of non-strict turnstile data streams

Abstract: We study the problem of generating a large sample from a data stream S of elements (i, v), where i is a positive integer key, v is an integer equal to the count of key i, and the sample consists of pairs (i, C i ) for C i = (i,v)∈S v. We consider strict turnstile streams and general non-strict turnstile streams, in which C i may be negative. Our sample is useful for approximating both forward and inverse distribution statistics, within an additive error and provable success probability 1 − δ.Our sampling metho… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(14 citation statements)
references
References 29 publications
0
14
0
Order By: Relevance
“…Constructions of k-sample recovery mechanisms are known which require spaceÕ(k) and fail only with probability polynomially small in n [5]. We apply this algorithm to the neighborhood of vertices: for each node v, we can maintain an instance of the k-sample recovery sketch (or algorithm) to the vector corresponding to the row of the adjacency matrix for v. Note that as edges are inserted or deleted, we can propagate these to the appropriate k-sample recovery algorithms, without needing knowledge of the full neighborhood of nodes.…”
Section: Preliminariesmentioning
confidence: 99%
“…Constructions of k-sample recovery mechanisms are known which require spaceÕ(k) and fail only with probability polynomially small in n [5]. We apply this algorithm to the neighborhood of vertices: for each node v, we can maintain an instance of the k-sample recovery sketch (or algorithm) to the vector corresponding to the row of the adjacency matrix for v. Note that as edges are inserted or deleted, we can propagate these to the appropriate k-sample recovery algorithms, without needing knowledge of the full neighborhood of nodes.…”
Section: Preliminariesmentioning
confidence: 99%
“…Some increase in space usage seems inevitable as we have to store § This approach of sampling and looking for duplicates may be folklore; it was described to the authors by T. S. Jayram each individual sample, however for better update times there are alternative solutions with faster processing times than the naive solution. In this direction, Barkay et al [9] have shown an L 0 sampler with O(log s δ ) update time and O(s log s δ ) sample extraction time, at the expense of relaxing the independence requirement of the samples. The extracted samples are guaranteed to be O(log 1 δ )-wise independent which is sufficient for most applications.…”
Section: Sampling Multiple Itemsmentioning
confidence: 99%
“…Such algorithms have been designed to operate deterministically, and require O(k polylog n) space [1].…”
Section: Maximal Matchingmentioning
confidence: 99%
“…Randomized constructions of k-sample algorithms are known (which use k-sparse recovery algorithms within them), and require O(k polylog n) space [1].…”
Section: Maximal Matchingmentioning
confidence: 99%