Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data 1994
DOI: 10.1145/191839.191886
|View full text |Cite
|
Sign up to set email alerts
|

Quickly generating billion-record synthetic databases

Abstract: Evaluating database system performance often requires generating synthetic databases -ones having certain statistical properties but filled with dummy information. When evaluating different database designs, it is often necessary to generate several databases and evaluate each design. As database sizes grow to terabytes, generation often takes longer than evaluation. This paper presents several database generation techniques. In particular it discusses:(1) Parallelism to get generation speedup and scaleup.(2) … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
60
0

Year Published

1999
1999
2013
2013

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 203 publications
(60 citation statements)
references
References 8 publications
0
60
0
Order By: Relevance
“…We in the computer science community have traditionally focused on scaling in size: how to efficiently manipulate large disk-bound data via suitable data structures [213], how to scale to databases of petabytes [114], synthesize massive data sets [115], etc. However, far less attention has been given to benchmarking, studying performance of systems under rapid updates with near-real time analyses.…”
Section: The Data Stream Phenomenonmentioning
confidence: 99%
“…We in the computer science community have traditionally focused on scaling in size: how to efficiently manipulate large disk-bound data via suitable data structures [213], how to scale to databases of petabytes [114], synthesize massive data sets [115], etc. However, far less attention has been given to benchmarking, studying performance of systems under rapid updates with near-real time analyses.…”
Section: The Data Stream Phenomenonmentioning
confidence: 99%
“…We need to incorporate both the true data via r i /W as well as our most pessimistic belief of the underlying skew. As a pessimistic prior, we choose the highly skewed Grays selfsimilar distribution [20], often used for the 80/20 rule. Only if we find a sequence which can not be explained (with more than 1% chance) with the 80/20 distribution, we believe we have encountered list walking.…”
Section: A Detecting Listsmentioning
confidence: 99%
“…An important milestone was the paper by Gray et al [12], the authors showed how to generate data sets with different distributions and dense unique sequences in linear time and in parallel. Fast, parallel generation of data with special distribution characteristics is the foundation of our data generation approach.…”
Section: Related Workmentioning
confidence: 99%