Metric Space Searching Based on Random Bisectors and Binary Fingerprints

Andrade, J. M. V. de; Astudillo, César; Paredes, Rodrígo

doi:10.1007/978-3-319-11988-5_5

Cited by 3 publications

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Querying Metric Spaces with Bit Operations

Connor

Dearle

2018

Similarity Search and Applications

View full text Add to dashboard Cite

Metric search techniques can be usefully characterised by the time at which distance calculations are performed during a query. Most exact search mechanisms use a "just-in-time" approach where distances are calculated as part of a navigational strategy. An alternative is to use a "one-time" approach, where distances to a fixed set of reference objects are calculated at the start of each query. These distances are typically used to re-cast data and queries into a different space where querying is more efficient, allowing an approximate solution to be obtained. In this paper we use a "one-time" approach for an exact search mechanism. A fixed set of reference objects is used to define a large set of regions within the original space, and each query is assessed with respect to the definition of these regions. Data is then accessed if, and only if, it is useful for the calculation of the query solution. As dimensionality increases, the number of defined regions must increase, but the memory required for the exclusion calculation does not. We show that the technique gives excellent performance over the SISAP benchmark data sets, and most interestingly we show how increases in dimensionality may be countered by relatively modest increases in the number of reference objects used. 1 Context To set a formal context, we are interested in searching a (large) finite set of objects S which is a subset of an infinite set U , where (U, d) is a metric space: that is, an ordered pair (U, d), where U is a domain of objects and d is a total distance function d : U ×U → R, satisfying postulates of non-negativity, identity, symmetry, and triangle inequality [20]. The general requirement is to efficiently find members of S which are similar to an arbitrary member of U given as a query, where the distance function d gives the only way by which any two objects may be compared. There are many important practical examples captured by this mathematical framework, see for example [16, 20]. The simplest type of similarity query is the range search query: for some threshold t, based on a query q ∈ U , the solution set is R = {s ∈ S| d(q, s) ≤ t}. The essence of metric search is to spend time pre-processing the finite set S so that solutions to queries can be efficiently calculated using only distances among objects. In all cases therefore, distances between the data and selected This is a post-peer-review, pre-copyedit version of a paper published in Marchand-Maillet S., Silva Y., Chávez E.

show abstract

Querying Metric Spaces with Bit Operations

Connor

Dearle

2018

Similarity Search and Applications

View full text Add to dashboard Cite

show abstract

Distance-Based Index Structures for Fast Similarity Search

Rachkovskij

2017

Cybern Syst Anal

View full text Add to dashboard Cite

Bitpart: Exact metric search in high(er) dimensions

Dearle

Connor

2021

Information Systems

View full text Add to dashboard Cite

We define BitPart (Bitwise representations of binary Partitions), a novel exact search mechanism intended for use in high-dimensional spaces. In outline, a fixed set of reference objects is used to define a large set of regions within the original space, and each data item is characterised according to its containment within these regions. In contrast with other mechanisms only a subset of this information is selected, according to the query, before a search within the recast space is performed. Partial data representations are accessed only if they are known to be potentially useful towards the calculation of the exact query solution. Our mechanism requires Ω(N log N) space to evaluate a query, where N is the cardinality of the data, and therefore does not scale as well as previously defined mechanisms with low-dimensional data. However it has recently been shown that, for a nearest neighbour search in high dimensions, a sequential scan of the data is essentially unavoidable. This result has been suspected for a long time, and has been referred to as the curse of dimensionality in this context. In the light of this result, the compromise achieved by this work is to make the best possible use of the available fast memory, and to offer great potential for parallel query evaluation. To our knowledge, it gives the best compromise currently known for performing exact search over data whose dimensionality is too high to allow the useful application of metric indexing, yet is still sufficiently low to give at least some traction from the metric and supermetric properties.

show abstract

Metric Space Searching Based on Random Bisectors and Binary Fingerprints

Cited by 3 publications

References 9 publications

Querying Metric Spaces with Bit Operations

Querying Metric Spaces with Bit Operations

Distance-Based Index Structures for Fast Similarity Search

Bitpart: Exact metric search in high(er) dimensions

Contact Info

Product

Resources

About