2013
DOI: 10.14778/2556549.2556574
|View full text |Cite
|
Sign up to set email alerts
|

Streaming similarity search over one billion tweets using parallel locality-sensitive hashing

Abstract: Finding nearest neighbors has become an important operation on databases, with applications to text search, multimedia indexing, and many other areas. One popular algorithm for similarity search, especially for high dimensional data (where spatial indexes like kdtrees do not perform well) is Locality Sensitive Hashing (LSH), an approximation algorithm for finding similar objects.In this paper, we describe a new variant of LSH, called Parallel LSH (PLSH) designed to be extremely efficient, capable of scaling ou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
88
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 90 publications
(88 citation statements)
references
References 19 publications
0
88
0
Order By: Relevance
“…While there are several real-time/streaming applications in high-dimensional spaces that leverage LSH, there have been few works [29,30,31] that aim at improving LSH for streaming applications. In [29], the authors present a parallel LSH framework that is designed to handle similarity searches on incoming twitter data. The goal of their cache-conscious model is to improve on the creation and updation of the hash tables (which are based on the original LSH design).…”
Section: Real-time Variants Of Locality Sensitive Hashingmentioning
confidence: 99%
“…While there are several real-time/streaming applications in high-dimensional spaces that leverage LSH, there have been few works [29,30,31] that aim at improving LSH for streaming applications. In [29], the authors present a parallel LSH framework that is designed to handle similarity searches on incoming twitter data. The goal of their cache-conscious model is to improve on the creation and updation of the hash tables (which are based on the original LSH design).…”
Section: Real-time Variants Of Locality Sensitive Hashingmentioning
confidence: 99%
“…The second element computes the correlations, performs the significance test and forwards qualified pairs to the last element, where duplicate removal is performed. LSHC: LSHC is based on locality sensitive hashing (LSH) [21], which use the property that the normalized sliding windows of significant correlated time series are close in Euclidean space (refer Section IV). The topology of LSHC consists of three processing elements.…”
Section: A Baselinesmentioning
confidence: 99%
“…Sliding windows that are mapped to a bucket in each hash table are shuffled to the same task of the second element, where the correlation computation is performed over the sliding windows in each bucket per hash table. LSHC parameters are chosen to minimize the processing latency while ensuring the failure probability (i.e., the probability of not reporting a certain qualified pair) at 5% [21]. Likewise, the last element aggregates correlated time series pairs…”
Section: A Baselinesmentioning
confidence: 99%
See 1 more Smart Citation
“…Since many databases, networks, and file systems benefit from the quick filtering of negative queries (often to avoid costly disk or network accesses), AMQs have found wide use. Such applications are emerging research areas on GPUs [2], [3], [4].…”
Section: Introductionmentioning
confidence: 99%