Halt or Continue: Estimating Progress of Queries in the Cloud

Shi, Yong; Meng, Xiaofeng; Liu, Bingbing

doi:10.1007/978-3-642-29035-0_12

Cited by 1 publication

(2 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The secondphase sampler allows reducers perform sampling from mappers' output during shuffle phase of join_job, and produces distributed stratified random samples from multiple tables respectively. You can find more details in our paper [8].…”

Section: Data Managermentioning

confidence: 99%

See 1 more Smart Citation

COLA: A cloud-based system for online aggregation

Gan

Meng

Shi

2013

2013 IEEE 29th International Conference on Data Engineering (ICDE)

View full text Add to dashboard Cite

Abstract-Online aggregation is a promising solution to achieving fast early responses for interactive ad-hoc queries that compute aggregates on massive data. To process large datasets on large-scale computing clusters, MapReduce has been introduced as a popular paradigm into many data analysis applications. However, typical MapReduce implementations are not well-suited to analytic tasks, since they are geared towards batch processing. With the increasing popularity of ad-hoc analytic query processing over enormous datasets, processing aggregate queries using MapReduce in an online fashion is therefore an emerging important application need.We present a MapReduce-based online aggregation system called COLA, which provides progressive approximate aggregate answers for both single table and multiple joined tables. COLA provides an online aggregation execution engine with novel sampling techniques to support incremental and continuous computing of aggregation, and minimize the waiting time before an acceptably precise estimate is available. In addition, userfriendly SQL queries are supported in COLA. Furthermore, COLA can implicitly convert non-OLA jobs into online version so that users don't have to write any special-purpose code to make estimates.

show abstract

Section: Data Managermentioning

confidence: 99%

“…Then the component computes the most critical path of the PERT network, which can represent the execution of the whole query. Our paper [8] has the detailed discussion about how to make an estimate of the query progress according to the critical path.…”

Section: Online Aggregation Executormentioning

confidence: 99%