Estimating Recall and Precision for Vague Queries in Databases

Stasiu, Raquel Kolitski; Heuser, Carlos A.; Silva, Roberto da

doi:10.1007/11431855_14

Cited by 12 publications

(20 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Further, the experiments demonstrate how the proposed automatic approach leads to results that are close to those obtained by the approach that we developed previously [33], which requires human intervention.…”

Section: Introductionsupporting

confidence: 74%

See 1 more Smart Citation

Automatic threshold estimation for data matching applications

Santos¹,

Heuser²,

Moreira³

et al. 2011

Information Sciences

Self Cite

View full text Add to dashboard Cite

Section: Introductionsupporting

confidence: 74%

“…In previous work [33], we proposed a procedure for reducing human intervention. In that procedure, instead of generating the clusters manually (Step 2 of the procedure above), the human expert just informs how many distinct real world objects are represented by the instances in the samples taken from the dataset.…”

Section: Introductionmentioning

confidence: 99%

Automatic threshold estimation for data matching applications

Santos¹,

Heuser²,

Moreira³

et al. 2011

Information Sciences

Self Cite

View full text Add to dashboard Cite

“…However, it is worth pointing out that this sampling process could be automated through the use of clustering algorithms, as done in our previous work (Stasiu, Heuser, & da Silva, 2005). In this case, all the elements of a given cluster are considered as representing the same real world object.…”

Section: Discussionmentioning

confidence: 99%

Measuring quality of similarity functions in approximate data matching

Silva

Stasiu

Moreira

et al. 2007

Journal of Informetrics

Self Cite

View full text Add to dashboard Cite

This paper presents a method for assessing the quality of similarity functions. The scenario taken into account is that of approximate data matching, in which it is necessary to determine whether two data instances represent the same real world object. Our method is based on the semi-automatic estimation of optimal threshold values. We propose two methods for performing such estimation. The first method is an algorithm based on a reward function, and the second is a statistical method. Experiments were carried out to validate the techniques proposed. The results show that both methods for threshold estimation produce similar results. The output of such methods was used to design a grading function for similarity functions. This grading function, called discernability, was used to compare a number of similarity functions applied to an experimental data set.

show abstract

“…Based on collection samples, a semi-automatic approach for the estimation of recall and precision values for various similarity thresholds minimizes efforts involved by static similarity threshold definitions [28,29]. It requires expert input only where the number of distinct objects contained in each sample is concerned and uses two techniques to reduce human interaction, namely (i) sample use and (ii) similarity cluster process.…”

Section: Related Workmentioning

confidence: 99%

“…A new approach [5] combines two strategies to eliminate human intervention [28,29], during the recall and precision values estimation process. They are (i) use of agglomerative hierarchical clustering algorithms and (ii) use of the silhouette coefficient for cluster evaluation.…”

Section: Related Workmentioning

confidence: 99%

Automatic and online setting of similarity thresholds in content-based visual information retrieval problems

Bessas

Pádua

Assis

et al. 2016

EURASIP J. Adv. Signal Process.

View full text Add to dashboard Cite

Several information recovery systems use functions to determine similarity among objects in a collection. Such functions require a similarity threshold, from which it becomes possible to decide on the similarity between two given objects. Thus, depending on its value, the results returned by systems in a search may be satisfactory or not. However, the definition of similarity thresholds is difficult because it depends on several factors. Typically, specialists fix a threshold value for a given system, which is used in all searches. However, an expert-defined value is quite costly and not always possible. Therefore, this study proposes an approach for automatic and online estimation of the similarity threshold value, to be specifically used by content-based visual information retrieval system (image and video) search engines. The experimental results obtained with the proposed approach prove rather promising. For example, for one of the case studies, the performance of the proposed approach achieved 99.5 % efficiency in comparison with that obtained by a specialist using an empirical similarity threshold. Moreover, such automated approach becomes more scalable and less costly.

show abstract

Estimating Recall and Precision for Vague Queries in Databases

Cited by 12 publications

References 31 publications

Automatic threshold estimation for data matching applications

Automatic threshold estimation for data matching applications

Measuring quality of similarity functions in approximate data matching

Automatic and online setting of similarity thresholds in content-based visual information retrieval problems

Contact Info

Product

Resources

About