2011
DOI: 10.14778/3402707.3402748
|View full text |Cite
|
Sign up to set email alerts
|

Online aggregation for large MapReduce jobs

Abstract: In online aggregation, a database system processes a user's aggregation query in an online fashion. At all times during processing, the system gives the user an estimate of the final query result, with the confidence bounds that become tighter over time. In this paper, we consider how online aggregation can be built into a MapReduce system for large-scale data processing. Given the MapReduce paradigm's close relationship with cloud computing (in that one might expect a large fraction of MapReduce jobs to be ru… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
51
0
1

Year Published

2014
2014
2020
2020

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 121 publications
(52 citation statements)
references
References 15 publications
(16 reference statements)
0
51
0
1
Order By: Relevance
“…Users can stop the execution whenever the error bound meets their requirement. Some efforts have been focused on implementing online aggregation in MapReduce environments [7,14].…”
Section: Related Workmentioning
confidence: 99%
“…Users can stop the execution whenever the error bound meets their requirement. Some efforts have been focused on implementing online aggregation in MapReduce environments [7,14].…”
Section: Related Workmentioning
confidence: 99%
“…文献[48] 研究了核密度估计 (kernel density estimate) 这个重 要的数据分析基础问题, 提出了随机和确定两类求解算法, 性能优于已有算法多个数量级. 文献[43] 基于 Map-Reduce, 提出了大数据在线聚集算法. 文献[121] 基于 Map-Reduce, 研究了流数据的集合 关系分析问题, 提出了基于数据划分、冗余存储和计算负载平衡的高性能并行算法.…”
unclassified
“…Answering this query requires accessing all location and air pollution measurements in the time period of interest, which can be substantial for long periods. To solve this problem, researchers have proposed approximate query processing algorithms (JERMAINE et al, 2007;AGARWAL et al, 2013;OOI;TAN, 2010;BABCOCK;DATAR;MOTWANI, 2004;PANSARE et al, 2011PANSARE et al, , 2011POTTI;PATEL, 2015;LAZARIDIS;) that approximate the query result by looking at a subset of the data.…”
Section: Publicationsmentioning
confidence: 99%
“…On the other hand, if the user demands a lower error, the algorithm will be able to satisfy the request by visiting lower levels of the segment trees (which exact nodes will be visited also depends on the query and the interplay of the time series in it). Leveraging the trees, PlatoDB can even provide users with continuously improving approximate answers and error guarantees, allowing them to stop the computation at any time, similar to works in online aggregation WANG, 1997;CONDIE et al, 2010;PANSARE et al, 2011).…”
Section: A2 System Architecturementioning
confidence: 99%
See 1 more Smart Citation