The LSD/sup h/-tree: an access structure for feature vectors

Henrich, Andreas

doi:10.1109/icde.1998.655799

Cited by 57 publications

(48 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Even a number of specialized index structures for high-dimensional data spaces have been proposed [6,12,22,28,30,36]. In spite of these efforts, there are still high-dimensional indexing problems under which even specialized index structures deteriorate in performance.…”

Section: Motivationmentioning

confidence: 99%

Dynamically Optimizing High-Dimensional Index Structures

Böhm¹,

Kriegel²

2000

Advances in Database Technology — EDBT 2000

View full text Add to dashboard Cite

Abstract. In high-dimensional query processing, the optimization of the logical page-size of index structures is an important research issue. Even very simple query processing techniques such as the sequential scan are able to outperform indexes which are not suitably optimized. Page-size optimization based on a cost model faces the problem, that the optimum not only depends on static schema information such as the dimension of the data space but also on dynamically changing parameters such as the number of objects stored in the database and the degree of clustering and correlation in the current data set. Therefore, we propose a method for adapting the page size of an index dynamically during insert processing. Our solution, called DABS-tree, uses a flat directory whose entries consist of an MBR, a pointer to the data page and the size of the data page. Before splitting pages in insert operations, a cost model is consulted to estimate whether the split operation is beneficial. Otherwise, the split is avoided and the logical page-size is adapted instead. A similar rule applies for merging when performing delete operations. We present an algorithm for the management of data pages with varying page-sizes in an index and show that all restructuring operations are locally restricted. We show in our experimental evaluation that the DABS tree outperforms the X-tree by a factor up to 4.6 and the sequential scan by a factor up to 6.6. MotivationQuery processing in high-dimensional data spaces is an emerging research domain which gains increasing importance by the need to support modern applications by powerful search tools. In the so-called non-standard applications of database systems such as multimedia [16,33,34], CAD [11,13,21,25], molecular biology [26,29], medical imaging [27], time series analysis [1, 2, 18], and many others, similarity search in large data sets is required as a basic functionality.A technique widely applied for similarity search is the so-called feature transformation, where important properties of the objects in the database are mapped into points of a multidimensional vector space, the so-called feature vectors. Thus, similarity queries are naturally translated into neighborhood queries in the feature space.In order to achieve a high performance in query processing, multidimensional index structures [20] are applied for the management of the feature vectors. Even a number of specialized index structures for high-dimensional data spaces have been proposed [6,

show abstract

Section: Motivationmentioning

confidence: 99%

Dynamically Optimizing High-Dimensional Index Structures

Böhm¹,

Kriegel²

2000

Advances in Database Technology — EDBT 2000

View full text Add to dashboard Cite

show abstract

“…The directory of the LSD h -tree [Hen98] is also an adaptive kd-tree [Ben75,Ben79]. In contrast to R-tree variants and k-d-B-tree, the region description is coded in a sophisticated way leading to reduced space requirement for the region description.…”

Section: Structures With a Kd-tree Directorymentioning

confidence: 99%

“…In general, the index structures can be classified in two groups: Data organizing structures such as R-trees [Gut84,BKSS90] and kd-tree-based methods (k-d-B-tree [Rob81], hB-tree [LS89, LS90,Eva94], and LSD-htree [Hen98]). …”

Section: Definition 3 (-Similarity Query Nn-similarity Query)mentioning

confidence: 99%

Searching in high-dimensional spaces

Böhm¹,

Berchtold²,

2001

View full text Add to dashboard Cite

During the last decade, multimedia databases have become increasingly important in many application areas such as medicine, CAD, geography, or molecular biology. An important research issue in the field of multimedia databases is the content based retrieval of similar multimedia objects such as images, text, and videos. However, in contrast to searching data in a relational database, a content based retrieval requires the search of similar objects as a basic functionality of the database system. Most of the approaches addressing similarity search use a so-called feature transformation which transforms important properties of the multimedia objects into high-dimensional points (feature vectors). Thus, the similarity search is transformed into a search of points in the feature space which are close to a given query point in the high-dimensional feature space. Query Processing in high-dimensional spaces has therefore been a very active research area over the last few years. A number of new index structures and algorithms have been proposed. It has been shown that the new index structures considerably improve the performance in querying large multimedia databases. Based on recent tutorials [BK98, BK 00], in this survey we provide an overview of the current state-of-the-art in querying multimedia databases, describing the index structures and algorithms for an efficient query processing in high-dimensional spaces. We identify the problems of processing queries in high-dimensional space, and we provide an overview of the proposed approaches to overcome these problems. Indexing Multimedia DatabasesMultimedia databases are of high importance in many application areas such as geography, CAD, medicine, or molecular biology. Depending on the application, the multimedia databases need to have different properties and need to support different types of queries. In contrast to traditional database applications, where point, range, and partial match queries are very important, multimedia databases require a search for all objects in the database which are similar (or complementary) to a given search object. In the following, we describe the notion of similarity queries and the feature-based approach to process those queries in multimedia databases in more detail.

show abstract

“…For this, we adopt the "distance browsing" concept proposed in [12], through which it is possible to efficiently access data points in increasing order of distance from the query point. It is predicated on having an index structure with containment property, such as R-Tree [10], R * -Tree [1], LSD-trees [11], etc., built collectively on all dimensions of the database (more precisely, we need the index to only cover those dimensions on which point predicates appear in the query workload). This assumption appears practical since current database systems such as Oracle, natively support R-trees [14].…”

Section: Distance Browsingmentioning

confidence: 99%

Providing Diversity in K-Nearest Neighbor Query Results

Jain

Sarda

Haritsa

2004

Advances in Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

Given a point query Q in multi-dimensional space, K-Nearest Neighbor (KNN) queries return the K closest answers according to given distance metric in the database with respect to Q. In this scenario, it is possible that a majority of the answers may be very similar to some other, especially when the data has clusters. For a variety of applications, such homogeneous result sets may not add value to the user. In this paper, we consider the problem of providing diversity in the results of KNN queries, that is, to produce the closest result set such that each answer is sufficiently different from the rest. We first propose a user-tunable definition of diversity, and then present an algorithm, called MOTLEY, for producing a diverse result set as per this definition. Through a detailed experimental evaluation on real and synthetic data, we show that MOTLEY can produce diverse result sets by reading only a small fraction of the tuples in the database. Further, it imposes no additional overhead on the evaluation of traditional KNN queries, thereby providing a seamless interface between diversity and distance.

show abstract

The LSD/sup h/-tree: an access structure for feature vectors

Cited by 57 publications

References 20 publications

Dynamically Optimizing High-Dimensional Index Structures

Dynamically Optimizing High-Dimensional Index Structures

Searching in high-dimensional spaces

Providing Diversity in K-Nearest Neighbor Query Results

Contact Info

Product

Resources

About