One of the common queries in many database applications is finding approximate matches to a given query item from a collection of data items. For example, given an image database, one may want to retrieve all images that are similar to a given query image. Distance-based index structures are proposed for applications where the distance computations between objects of the data domain are expensive (such as high-dimensional data) and the distance function is metric. In this paper we consider using distance-based index structures for similarity queries on large metric spaces. We elaborate on the approach that uses reference points (vantage points) to partition the data space into spherical shell-like regions in a hierarchical manner. We introduce the multivantage point tree structure (mvp-tree) that uses more than one vantage point to partition the space into spherical cuts at each level. In answering similaritybased queries, the mvp-tree also utilizes the precomputed (at construction time) distances between the data points and the vantage points.We summarize the experiments comparing mvp-trees to vp-trees that have a similar partitioning strategy, but use only one vantage point at each level and do not make use of the precomputed distances. Empirical studies show that the mvp-tree outperforms the vp-tree by 20% to 80% for varying query ranges and different distance distributions. Next, we generalize the idea of using multiple vantage points and discuss the results of experiments we have made to see how varying the number of vantage points in a node affects search performance and how much is gained in performance by making use of precomputed distances. The results show that, after all, it may be best to use a large number of vantage points in an internal node in order to end up with a single directory node and keep as many of the precomputed distances as possible to provide more efficient filtering during search operations. Finally, we provide some experimental results that compare mvp-trees with M-trees, which is a dynamic distance-based index structure for metric domains.A preliminary version of this paper appeared in
The aim is to process distributed queries efficiently. The cost of communications between sites is dominant in processing such queries. It is assumed that the amount of data transferred determines the transmission cost to a large extent. Thus, it is desirable to minimize the amount of transmitted data.Bernstein .and Chiu [2] classified queries into two types: tree and cyclic queries. They defined an operation called semi-join which requires minimal transfer of data between sites. Then they showed that tree queries can always be answered by semi-joins but cyclic queries may not. An algorithm to decide whether a query is cyclic or not was presented in their paper. Their algorithm works when the number of domains in common between any two relations is no more than one. The aim of this paper is to generalize their algorithm. Specifically, we present a conceptionally simple algorithm which ciecides the type of a query when the number of domains in common between two relations may exceed one. i s outlined. The algorithm runs inO(max(e,e')) time and O(e) space complexity where e and e' are the number of edges in the transitive closure of the join graph and the query graph respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.