Similarity sets: A new concept of sets to seamlessly handle similarity in database management systems

Pola, Ives Renê Venturini; Cordeiro, Robson L. F.; Traina, Caetano; Traina, Agma J. M.

doi:10.1016/j.is.2015.01.011

Cited by 8 publications

(9 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Many researchers have been proposing strategies to support similarity comparison in Relational Database Management Systems -RDBMS [Silva et al 2015, Pola et al 2015, Budíková et al 2012, Belohlavek and Vychodil 2010, commonly by extending Relational Operators. The vast majority of them focuses on the Selection [Silva et al 2013] in which similarity awareness is achieved by means of range queries, nearest neighbors queries, and their many variants.…”

Section: Basic Concepts and Related Workmentioning

confidence: 99%

The similarity-aware relational division database operator

Gonzaga

Cordeiro

2018

Anais Do Concurso De Teses E Dissertações Da SBC (CTD-SBC)

View full text Add to dashboard Cite

This paper describes the motivation, contributions and impact of the MSc. dissertation that proposes the first Similarity-aware Division (÷) database operator. The novel operator is naturally well suited to answer queries with an idea of "candidate elements and exigencies" to be performed on complex data from real applications of high-impact, such as in agriculture, genetics, industrial production, digital libraries and enterprise management.

show abstract

Section: Basic Concepts and Related Workmentioning

confidence: 99%

The similarity-aware relational division database operator

Gonzaga

Cordeiro

2018

Anais Do Concurso De Teses E Dissertações Da SBC (CTD-SBC)

View full text Add to dashboard Cite

show abstract

“…For example, range and nearest queries are unary operators that retrieve similar elements based on a list of parameters, such as the query center, radius or the number of nearest neighbors to find. Another class of unary operator can be employed to data set extraction, such as the SimSet [1] extraction technique that filters a data set by eliminating near-duplicates based on a given radius threshold.…”

Section: Introductionmentioning

confidence: 99%

Double Distance-Calculation-Pruning for Similarity Search

Pola

Eler

2018

Information

View full text Add to dashboard Cite

Abstract:Many modern applications deal with complex data, where retrieval by similarity plays an important role. Complex data main comparison mechanisms are based on similarity predicates. They are usually immersed in metric spaces where distance functions are employed to express the similarity and a lower bound property is usually employed to prevent distance calculations. Retrieval by similarity is implemented by unary and binary operators. Most of the studies aimed at improving the efficiency of unary operators, either by using metric access methods or mathematical properties to prune parts of the search space during query answering. Studies on binary operators to solve similarity joins aim to improve efficiency and most of them use only the metric lower bound property for pruning. However, they are dependent on the query parameters, such as the range radius. In this paper, we propose a generic concept that uses both lower and upper bound properties based on the Metric Spaces Theory to increase the avoidance of element comparisons. The concept can be applied on any existing similarity retrieval method. We analyzed the prunability power increase and show an example of its application on classical join nested loops algorithms. Practical evaluation over both synthetic and real data sets shows that our method reduced the number of distance evaluations on similarity joins.

show abstract

“…Today, many real data sets include, besides the traditional numeric values and small texts, more complex data objects such as images, audio files, videos, time series, genetic data elements, large graphs, long texts, fingerprints, and many others (POLA et al, 2013;ZEZULA et al, 2006;SILVA et al, 2010). One central distinction between traditional and complex data is that the latter must be compared by similarity, since comparisons by identity (=) are in most cases senseless and/or unfeasible for data of a more complex nature (MARRI et al, 2014;MARRI et al, 2016;JACOX;SAMET, 2008;KALASHNIKOV, 2013;POLA et al, 2015;SILVA et al, 2013;SILVA et al, 2010;TANG et al, 2016a). To illustrate this fact, let us consider again the division query about cities and crops, but now using a more realistic variation of our toy dataset in which we do not have cities carefully partitioned into regions and neither textual tags ready to be used to describe each region.…”

Section: Problem and Motivationmentioning

confidence: 99%

“…Many researchers have been proposing strategies to support similarity comparison in Relational Database Management Systems -RDBMS (SILVA et al, 2010;POLA et al, 2013;BUDÍKOVÁ;ZEZULA, 2012;BARIONI et al, 2009;BELOHLAVEK;VYCHODIL, 2010), commonly by means of extending operators of the Relational Algebra. For example, recent works focus on the Join (SILVA et al, 2015; KALASHNIKOV, 2013; SILVA; PEARSON, 2012; SILVA; AREF; ALI, 2010), Selection (SILVA et al, 2013;SANTOS et al, 2013), Grouping and Aggregation (TANG et al, 2016a;TANG et al, 2016b;ALI, 2009), Union (POLA et al, 2015;MARRI et al, 2016), Intersection (POLA et al, 2015MARRI et al, 2014;MARRI et al, 2016) and Difference (POLA et al, 2015;MARRI et al, 2016). However, to the best of our knowledge, no one focuses on the Division.…”

Section: Problem and Motivationmentioning

confidence: 99%

“…The relational set-based operators have also been studied. Pola et al (POLA et al, 2015;POLA et al, 2013) introduced the SimSets, which are sets of data without any pair of similar elements with regard to a given distance function and a similarity threshold. The function must be a metric, thus the usual absence of identical elements is maintained.…”

Section: Similarity-awareness In the Relational Algebramentioning

confidence: 99%

See 1 more Smart Citation

The Similarity-aware Relational Division Database Operator

Gonzaga¹

View full text Add to dashboard Cite

O operador de Divisão (÷) da Álgebra Relacional permite representar de forma simples consultas com o conceito de "para todos", e por isso é requerido em diversas aplicações reais. Entretanto, evidencia-se neste trabalho de mestrado que a divisão não atende às necessidades de diversas aplicações atuais, principalmente quando estas analisam dados complexos, como imagens, áudio, textos longos, impressões digitais, entre outros. Analisando o problema verifica-se que a principal limitação é a existência de comparações de valores de atributos intrínsecas à Divisão Relacional, que, por definição, são efetuadas sempre por identidade (=), enquanto objetos complexos devem geralmente ser comparados por similaridade. Hoje, encontram-se na literatura propostas de operadores relacionais com suporte à similaridade de objetos complexos, entretanto, nenhuma trata a Divisão Relacional. Este trabalho de mestrado propõe investigar e estender o operador de Divisão da Álgebra Relacional para melhor adequá-lo às demandas de aplicações atuais, por meio de suporte a comparações de valores de atributos por similaridade. Mostra-se aqui que a Divisão por Similaridade é naturalmente adequada a responder consultas diversas com um conceito de "elementos candidatos e exigências" descrito na monografia, envolvendo dados complexos de aplicações reais de alto impacto, com potencial por exemplo, para apoiar a agricultura, análises de dados genéticos, buscas em bibliotecas digitais, e até mesmo para controlar a qualidade de produtos manufaturados e a identificação de novos clientes em indústrias. Para validar a proposta, propõe-se estudar as duas primeiras aplicações citadas.

show abstract

Similarity sets: A new concept of sets to seamlessly handle similarity in database management systems

Cited by 8 publications

References 31 publications

The similarity-aware relational division database operator

The similarity-aware relational division database operator

Double Distance-Calculation-Pruning for Similarity Search

The Similarity-aware Relational Division Database Operator

Contact Info

Product

Resources

About