Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data 1990
DOI: 10.1145/93597.98746
|View full text |Cite
|
Sign up to set email alerts
|

Random sampling from hash files

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
28
0

Year Published

1990
1990
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 27 publications
(28 citation statements)
references
References 7 publications
0
28
0
Order By: Relevance
“…However, all variants of reservoir sampling require overwriting random sample items in R, and such overwrites are expensive in flash (see Section 7). Olken and Rotem [18] present techniques for constructing samples in a database environment. However, in addition to not being designed for flash media, the techniques assume we are sampling from disk-resident, indexed data.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…However, all variants of reservoir sampling require overwriting random sample items in R, and such overwrites are expensive in flash (see Section 7). Olken and Rotem [18] present techniques for constructing samples in a database environment. However, in addition to not being designed for flash media, the techniques assume we are sampling from disk-resident, indexed data.…”
Section: Related Workmentioning
confidence: 99%
“…One possible approach would be to adapt Olken and Rotem's procedure of batch sampling from a hashed file [18]. The basic idea is first to determine how many samples need to be drawn from each bucket (using a multinomial distribution), and then to draw the target number of samples from each bucket with the acception/rejection algorithm or the reservoir sampling algorithm.…”
Section: Random Subsamplingmentioning
confidence: 99%
“…Several papers studied techniques for random sampling from B-trees [24,21,20]. Most of these assume the tree is balanced, and are therefore not efficient for highly unbalanced trees, as is the case with the suggestion TRIE.…”
Section: Related Workmentioning
confidence: 99%
“…In order to produce random samples from such a materialized view, we can employ iterative or batch sampling techniques [16], [18]- [21] that sample directly from a relational selection predicate, thus avoiding the aforementioned problem of obtaining too few relevant records in the sample. Olken [19] presents a comprehensive analysis and comparison of many such techniques.…”
Section: B Sampling From Indicesmentioning
confidence: 99%
“…The classic work in this area (by Olken and his co-authors [16]- [18]) suffers from a key drawback: each record sampled from a database file requires a random disk I/O. At a current rate of around 100 random disk I/Os per second per disk, this means that it is possible to retrieve only 6,000 samples per minute.…”
Section: Introductionmentioning
confidence: 99%