1994
DOI: 10.1145/190627.190647
|View full text |Cite
|
Sign up to set email alerts
|

Loading databases using dataflow parallelism

Abstract: This paper describes a parallel database load prototype for Digital's Rdb database product. The prototype takes a dataflow approach to database parallelism. It includes an explorer that discovers and records the cluster configuration in a database, a client CUI interface that gathers the load job description from the user and 'from the Rdb catalogs, and an optimizer that picks the best parallel execution plan and records it in a web data structure. The web describes the data operators, the dataflow rivers amon… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

1997
1997
2018
2018

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 26 publications
(10 citation statements)
references
References 3 publications
0
10
0
Order By: Relevance
“…Sequential loads can take a very long time, e.g., loading a terabyte of data can take weeks and months! Hence, pipelined and partitioned parallelism are typically exploited 6 . Doing a full load has the advantage that it can be treated as a long batch transaction that builds up a new database.…”
Section: Loadmentioning
confidence: 99%
“…Sequential loads can take a very long time, e.g., loading a terabyte of data can take weeks and months! Hence, pipelined and partitioned parallelism are typically exploited 6 . Doing a full load has the advantage that it can be treated as a long batch transaction that builds up a new database.…”
Section: Loadmentioning
confidence: 99%
“…Parallel databases Dryad is heavily indebted to the traditional parallel database field [18]: e.g., Vulcan [22], Gamma [17], RDb [11], DB2 parallel edition [12], and many others. Many techniques for exploiting parallelism, including data partitioning; pipelined and partitioned parallelism; and hash-based distribution are directly derived from this work.…”
Section: Dataflowmentioning
confidence: 99%
“…In database systems, parallel sorting is most heavily used in data loading and index creation [Barclay et al 1994]. One key issue is data skew and load balancing, especially when using range partitioning [Iyer and Dias 1990;Manku et al 1998].…”
Section: Parallelism and Threadingmentioning
confidence: 99%