2009
DOI: 10.14778/1687627.1687677
|View full text |Cite
|
Sign up to set email alerts
|

Composable, scalable, and accurate weight summarization of unaggregated data sets

Abstract: Many data sets occur as unaggregated data sets, where multiple data points are associated with each key. In the aggregate view of the data, the weight of a key is the sum of the weights of data points associated with the key. Examples are measurements of IP packet header streams, distributed data streams produced by events registered by sensor networks, and Web page or multimedia requests to context distribution servers. We aim to combine sampling and aggregation to provide accurate and efficient summaries of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2009
2009
2018
2018

Publication Types

Select...
2
2
1

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(10 citation statements)
references
References 31 publications
(38 reference statements)
0
10
0
Order By: Relevance
“…Haas and K€onig€ [23] proposed a new sampling scheme, which combines the rowlevel and page-level samplings in the field of relational DBMS. Data sampling is also well used in the field of distributed and streaming environments [24], [25]. Histogram is another important technique for selectivity estimation.…”
Section: Related Workmentioning
confidence: 99%
“…Haas and K€onig€ [23] proposed a new sampling scheme, which combines the rowlevel and page-level samplings in the field of relational DBMS. Data sampling is also well used in the field of distributed and streaming environments [24], [25]. Histogram is another important technique for selectivity estimation.…”
Section: Related Workmentioning
confidence: 99%
“…There is a large body of work on computing statistics over unaggregated data which we can not hope to cover here. The toolbox includes deterministic algorithms [29], other sampling algorithms [8], and Linear sketches (random linear projections) [26], [1], [25], [13] Deterministic algorithms work well for approximate heavy hitters and quantiles. Linear sketches project the key-weight vectors to a lower dimensional vector.…”
Section: Related Workmentioning
confidence: 99%
“…In this example, val\]func simply returns the current number of connection attempts for a host. 4 However, the function could be more complex than that. In our application, one could for example instead implement a threshold relative to the number of successful connections.…”
Section: User Interfacementioning
confidence: 99%
“…Cohen et al [4] present an abstract framework for weighted sampling in distributed settings. It is similar in intent to our work, however, it only considers the case of sampling, and evaluates optimal algorithms for this setting.…”
Section: Communication Overheadmentioning
confidence: 99%