We present a scalable approach for range and
k
nearest neighbor queries under computationally expensive metrics, like the continuous Fréchet distance on trajectory data. Based on clustering for metric indexes, we obtain a dynamic tree structure whose size is linear in the number of trajectories, regardless of the trajectory’s individual sizes or the spatial dimension, which allows one to exploit low “intrinsic dimensionality” of datasets for effective search space pruning.
Since the distance computation is expensive, generic metric indexing methods are rendered impractical. We present strategies that (i) improve on known upper and lower bound computations, (ii) build cluster trees without any or very few distance calls, and (iii) search using bounds for metric pruning, interval orderings for reduction, and randomized pivoting for reporting the final results.
We analyze the efficiency and effectiveness of our methods with extensive experiments on diverse synthetic and real-world datasets. The results show improvement over state-of-the-art methods for exact queries, and even further speedups are achieved for queries that may return approximate results. Surprisingly, the majority of exact nearest-neighbor queries on real datasets are answered
without any
distance computations.
We study the problem of sub-trajectory nearest-neighbor queries on polygonal curves under the continuous Fréchet distance. Given an
n
vertex trajectory
P
and an
m
vertex query trajectory
Q
, we seek to
report
a vertex-aligned sub-trajectory
P
′ of
P
that is closest to
Q
, i.e.,
P′
must start and end on contiguous vertices of
P
. Since in real data
P
typically contains a
very large
number of vertices, we focus on answering queries, without restrictions on
P
or
Q
, using only precomputed structures of
𝒪(n)
size.
We use three baseline algorithms from straightforward extensions of known work; however, they have impractical performance on realistic inputs. Therefore, we propose a new Hierarchical Simplification Tree (HST) data structure and an adaptive clustering-based query algorithm that efficiently explores relevant parts of
P
. The core of our query methods is a novel greedy-backtracking algorithm that solves the Fréchet decision problem using
𝒪(n+m)
space and
𝒪O(nm)
time in the worst case.
Experiments on real and synthetic data show that our heuristic effectively prunes the search space and greatly reduces computations compared to baseline approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.