Efficient and effective KNN sequence search with approximate n-grams

Wang, Xiaoli; Ding, Xiuli; Tung, Anthony K. H.; Zhang, Zhenjie

doi:10.14778/2732219.2732220

Cited by 23 publications

(23 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As we can see from Figure 1, query Q 1 to retrieve the tuples with conditions [1,2]), (B, [1,1]), (C, [2,3]…”

Section: A Match-count Modelmentioning

confidence: 99%

“…Finally the output of the match-count model is the sum of the integers M C(Q, O) = ri∈Q C(r i , O). For example, in Figure 1, for Q 1 and O 1 we have C((A, [1,2]), O 1 ) = 1, C((B, [1,1]), O 1 ) = 0 and C((C, [2,3]), O 1 ) = 0, then we have M C(Q 1 , O 1 ) = 1 + 0 + 0 = 1.…”

Section: A Match-count Modelmentioning

confidence: 99%

“…2. How to organize data structures as inverted indexes has been extensively investigated by previous literature [2], [10], [11] and it is beyond the scope of this paper. 1) Transformed by LSH.…”

Section: B Genie With Lsh and Samentioning

confidence: 99%

“…Initially we have AT = 1, ZA = [0, 0, 0], BC = {O 1 : 0, O 2 : 0, O 3 : 0} and HT = ∅. For easy explanation, we assume the postings lists matched by Q 1 are scanned with the order of (A, [1,2]), (B, [1,1]) and (C, [2,3]). (On the GPU they are processed with multiple blocks in parallel with random order.…”

Section: Count Priority Queuementioning

confidence: 99%

“…For example, image matching is often done by extracting hundreds of high dimensional SIFT (scale-invariant feature transform) features and matching them against SIFT features in the database. Parallelization for similarity search is required for high performance on modern hardware architectures [1], [2], [3], [4].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

A Generic Inverted Index Framework for Similarity Search on the GPU

Zhou

Guo

Jagadish

et al. 2018

2018 IEEE 34th International Conference on Data Engineering (ICDE)

Self Cite

View full text Add to dashboard Cite

We propose a novel generic inverted index framework on the GPU (called GENIE), aiming to reduce the programming complexity of the GPU for parallel similarity search of different data types. Not every data type and similarity measure are supported by GENIE, but many popular ones are. We present the system design of GENIE, and demonstrate similarity search with GENIE on several data types along with a theoretical analysis of search results. A new concept of locality sensitive hashing (LSH) named τ -ANN search, and a novel data structure c-PQ on the GPU are also proposed for achieving this purpose. Extensive experiments on different real-life datasets demonstrate the efficiency and effectiveness of our framework. The implemented system has been released as open source 1 .

show abstract

“…As we can see from Figure 1, query Q 1 to retrieve the tuples with conditions [1,2]), (B, [1,1]), (C, [2,3]…”

Section: A Match-count Modelmentioning

confidence: 99%