2018 IEEE 34th International Conference on Data Engineering (ICDE) 2018
DOI: 10.1109/icde.2018.00200
|View full text |Cite
|
Sign up to set email alerts
|

Rainbow: Adaptive Layout Optimization for Wide Tables

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(13 citation statements)
references
References 6 publications
0
5
0
Order By: Relevance
“…We do not consider CPU cost due to its negligible impact compared to I/O cost (existing works [16,3] already proved that this is enough to capture the execution trend). Finally, we do not need any shuffling [3], because we focus only on the first operation loading data and therefore, the networking cost for shuffling is considered to be zero.…”
Section: Estimating Makespanmentioning
confidence: 99%
See 1 more Smart Citation
“…We do not consider CPU cost due to its negligible impact compared to I/O cost (existing works [16,3] already proved that this is enough to capture the execution trend). Finally, we do not need any shuffling [3], because we focus only on the first operation loading data and therefore, the networking cost for shuffling is considered to be zero.…”
Section: Estimating Makespanmentioning
confidence: 99%
“…Since huge volumes of data are difficult to be stored on model first load later fashion, organizations end up storing all the the raw data on a distributed file system (e.g., HDFS 3 ) or cloud storage (e.g., Amazon S3 4 ). In addition, they have their own data pipelines to process the raw data, and store it into very wide tables [4,15] using hybrid layouts [3,16], which have built-in support for projection and selection operations, helping in reading data more efficiently from the disk [27,28].…”
Section: Introductionmentioning
confidence: 99%
“…The I/O cost depends on the amount of data read within a task and the disk bandwidth. We do not consider CPU cost due to its negligible impact compared to I/O cost (existing works [2,11] already proved that this is enough to capture the execution trend). Finally, we focus on the first operation loading data, thus networking cost for shuffling is also considered to be zero [2].…”
Section: Task's Cost Estimationmentioning
confidence: 99%
“…We do not consider CPU cost due to its negligible impact compared to I/O cost (existing works [2,11] already proved that this is enough to capture the execution trend). Finally, we focus on the first operation loading data, thus networking cost for shuffling is also considered to be zero [2]. However, there is still a networking cost for metadata, because current solutions require to sequentially transfer metadata to all other executors before start processing the data.…”
Section: Task's Cost Estimationmentioning
confidence: 99%
See 1 more Smart Citation