Proceedings of the 21st ACM International Conference on Information and Knowledge Management 2012
DOI: 10.1145/2396761.2398438
|View full text |Cite
|
Sign up to set email alerts
|

Robust distributed indexing for locality-skewed workloads

Abstract: Multidimensional indexing is crucial for enabling a fast search over large-scale data. Owing to the unprecedented scale of data, extending such indexing technology has recently gained attention in distributed environments. The goal of existing efforts in distributed indexing has been the localization of queries to data residing at a small number of nodes (i.e., locality-preserving indexing) to minimize communication cost. However, considering that workloads often correlate with data locality, such indexing oft… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 28 publications
0
1
0
Order By: Relevance
“…H can be set as a multiple of the maximum number of parallel map tasks running in a cluster (the details of setting H are discussed in the Appendix B). In general, any hash function (e.g., [28]) which can partition objects into groups that keep the same distribution of objects as the overall distribution can be adopted here. In our experiments, since the identifiers of trajectory objects in the dataset are uniformly distributed, we simply hash the objects according to their identifiers, i.e., the hash function is a simple modulo function hash(tr)=tr.id%H.…”
Section: Trajectory Joinmentioning
confidence: 99%
“…H can be set as a multiple of the maximum number of parallel map tasks running in a cluster (the details of setting H are discussed in the Appendix B). In general, any hash function (e.g., [28]) which can partition objects into groups that keep the same distribution of objects as the overall distribution can be adopted here. In our experiments, since the identifiers of trajectory objects in the dataset are uniformly distributed, we simply hash the objects according to their identifiers, i.e., the hash function is a simple modulo function hash(tr)=tr.id%H.…”
Section: Trajectory Joinmentioning
confidence: 99%