Susanne Englert scite author profile

Evaluating database system performance often requires generating synthetic databases -ones having certain statistical properties but filled with dummy information. When evaluating different database designs, it is often necessary to generate several databases and evaluate each design. As database sizes grow to terabytes, generation often takes longer than evaluation. This paper presents several database generation techniques. In particular it discusses:(1) Parallelism to get generation speedup and scaleup.(2) Congruential generators to get dense unique uniform distributions.(3) Special-case discrete logarithms to generate indices concurrent to the base table generation.(4) Modification of (2) to get exponential, normal, and self-similar distributions.The discussion is in terms of generating billion-record SQL databases using C programs running on a shared-nothing computer system consisting of a hundred processors, with a thousand discs. The ideas apply to smaller databases, but large databases present the more difficult problems.

show abstract

Quickly generating billion-record synthetic databases

Gray

Sundaresan

Englert

et al. 1994

203

View full text Add to dashboard Cite

show abstract

Parallelism and its price

Englert¹,

Glasstone²,

Hasan

1995

SIGMOD Rec.

View full text Add to dashboard Cite

We describe the use of parallel execution techniques and measure the price of parallel execution in NonStop SQL/MP, a commercial parallel datahase system from Tandem Computers. Non-Stop SQL uses intra-operator parallelism to parallelize joins, groupings and scans. Parallel execution consists of starting up several processes and communicating data between them. Our measurements show (a) Startup costs are negligible when processes are reused rather than created afresh (b) Communication costs are significant -they may exceed the costs of operators such as scan, grouping or join. We also show two counter-examples to the common intuition that parallel execution reduces response time at the expense of increased work-parallel execution may reduce work or may increase response time depending on communication costs.All execution times reported in the paper are scaled. No inferences should be drawn about actual execution times. All query executions reported in the paper were created by bypassing the NonStop SQL optimizer. No inferences should be drawn about the behavior of the optimizer.

show abstract

A benchmark of NonStop SQL release 2 demonstrating near-linear speedup and scaleup on large databases

Englert

Gray

Kocher

et al. 1990

View full text Add to dashboard Cite

NonStop SQL is an implementation of ANSI/ISO SQL on Tandem Computer systems. In its second release, NonStop SQL transparently and automatically implements parallelism within an SQL statement. This parallelism allows query execution speed to increase almost linearly as processors and discs are added to the system-speedup. In addition, this parallelism can help jobs restricted to a fIxed "batch window". When the job doubles in size, its elapsed processing time will not change if proportionately more equipment is available to process the job-scaleup. This paper describes the parallelism features of NonStop SQL and an audited benchmark that demonstrates these speedup and scaleup claims.

show abstract

Nonstop SQL

Englert

1994

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Susanne Englert

Quickly generating billion-record synthetic databases

Quickly generating billion-record synthetic databases

Parallelism and its price

A benchmark of NonStop SQL release 2 demonstrating near-linear speedup and scaleup on large databases

Nonstop SQL

Contact Info

Product

Resources

About