Incremental all pairs similarity search for varying similarity thresholds

Awekar, Amit; Samatova, Nagiza F.; Breimyer, Paul

doi:10.1145/1731011.1731012

Cited by 5 publications

(5 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The above studies focus on finding binary or non-binary pairs with some specific similarity measures above some given thresholds. Recently, Awekar et al [4] studied the problem of searching candidate pairs incrementally for varying similarity thresholds. Xiao et al [35] studied the top-K set similarity joins problem for near duplicate detection, which enumerated all the "necessary" similarity thresholds in the decreasing order until the top-K set had been found.…”

Section: Mining Interesting Patternsmentioning

confidence: 99%

“…In other words, in the initial stage, we push P [1,2] and P [2,3] (P [i, j] is the pair of item [i] and item [j], given i≤j) into the top-2 list, and compute their cosine values. Then, in the updating stage, we traverse along the diagonals (denoted by the dash-dotted line) in the sorted item-matrix to check in sequence whether P [3,4] , P [4,5] , P [5,6] , P [4,6] , P [3,5] ,…, P [1,6] can enter the top-2 list, as shown in Fig. 1.…”

Section: 222mentioning

confidence: 99%

“…For example, for the sorted item-matrix in Fig. 1, if P [3,4] cannot enter the top-2 list for upper(cos(P [3,4] )) ≤ minCos, then all the pairs in the upper right corner of P [3,4] will also fail to enter the list, as shown by the shadowed area in Fig. 1.…”

Section: Theorem 1 Given the Current Top-k List And Its Mincos In A mentioning

confidence: 99%

“…The vector of "stage 2" shows the updated values. Next, suppose P [4,7] is the third pair with upper(cos(P [4,7] )) ≤ minCos, the boundary vector will be further updated to the one of "stage 3" accordingly. Now, given the asymptotic boundary vector above, we have the following criterion to decide whether an item pair should be pruned or not.…”

Section: Boundary Vector For the Pruning Statusmentioning

confidence: 99%

“…6. end For example, in the above case, after the traversal of the third diagonal, since the only one not pruned item pair P [4,7] has cosine upper bound less than minCos, we can safely stop our searching and return the current top-2 list as the final result. And the final boundary vector, i.e., the one of "stage 3", is indicated by the shaded areas of Fig.…”

mentioning

confidence: 99%

See 4 more Smart Citations