Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data 2010
DOI: 10.1145/1807167.1807238
|View full text |Cite
|
Sign up to set email alerts
|

Continuous sampling for online aggregation over multiple queries

Abstract: In this paper, we propose an online aggregation system called COSMOS (Continuous Sampling for Multiple queries in an Online aggregation System), to process multiple aggregate queries efficiently. In COSMOS, a dataset is first scrambled so that sequentially scanning the dataset gives rise to a stream of random samples for all queries. Moreover, COS-MOS organizes queries into a dissemination graph to exploit the dependencies across queries. In this way, aggregates of queries closer to the root (source of data fl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
46
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 62 publications
(46 citation statements)
references
References 24 publications
0
46
0
Order By: Relevance
“…Standard statistical formulas can help us get unbiased estimators and estimate the confidence interval. A lot of previous work [13][14][15][16] have made great contributions on this problem.…”
Section: Samplingmentioning
confidence: 99%
See 1 more Smart Citation
“…Standard statistical formulas can help us get unbiased estimators and estimate the confidence interval. A lot of previous work [13][14][15][16] have made great contributions on this problem.…”
Section: Samplingmentioning
confidence: 99%
“…Since then, research on online aggregation has been actively pursued. Xu et al [14] studied online aggregation with group by clause and Wu et al [16] proposed a continuous sampling algorithm for online aggregation over multiple queries. Qin and Rusu [27] extended online aggregate to distributed and parallel environments.…”
Section: Related Workmentioning
confidence: 99%
“…Another extension to MapReduce has been to address continuous processing such as stream processing [Stephens 1997;Golab and Özsu 2010] or online aggregation [Hellerstein et al 1997;Wu et al 2010b]. Recall that a sort-merge process is accomplished by the mapper and reducer modules.…”
Section: Streams and Continuous Query Processingmentioning
confidence: 99%
“…There are many different methods to sample a data warehouse I [1,12,13,17,19] and we consider two specific techniques:…”
Section: Sampling a Data Warehousementioning
confidence: 99%
“…This subject has become important in the context of streaming data [4,15,19]. In our approach, we consider the L1 distance between distributions: two answers are ε-close if the L1 distance between two distributions is less than ε.…”
Section: Introductionmentioning
confidence: 99%