Yifei Lu scite author profile

Given a query string Q, an edit similarity search finds all strings in a database whose edit distance with Q is no more than a given threshold τ . Most existing method answering edit similarity queries rely on a signature scheme to generate candidates given the query string. We observe that the number of signatures generated by existing methods is far greater than the lower bound, and this results in high query time and index space complexities.In this paper, we show that the minimum signature size lower bound is τ +1. We then propose asymmetric signature schemes that achieve this lower bound. We develop efficient query processing algorithms based on the new scheme. Several dynamic programming-based candidate pruning methods are also developed to further speed up the performance. We have conducted a comprehensive experimental study involving nine state-of-the-art algorithms. The experiment results clearly demonstrate the efficiency of our methods.

show abstract

XClean: Providing valid spelling suggestions for XML keyword queries

Wang

et al. 2011

View full text Add to dashboard Cite

Abstract-An important facility to aid keyword search on XML data is suggesting alternative queries when user queries contain typographical errors. Query suggestion thus can improve users' search experience by avoiding returning empty result or results of poor qualities.In this paper, we study the problem of effectively and efficiently providing quality query suggestions for keyword queries on an XML document. We illustrate certain biases in previous work and propose a principled and general framework, XClean, based on the state-of-the-art language model. Compared with previous methods, XClean can accommodate different error models and XML keyword query semantics without losing rigor. Algorithms have been developed that compute the top-k suggestions efficiently. We performed an extensive experiment study using two large-scale real datasets. The experiment results demonstrate the effectiveness and efficiency of the proposed methods.

show abstract

Asymmetric signature schemes for efficient exact edit similarity query processing

Qin

Wang

Xiao

et al. 2013

ACM Trans. Database Syst.

View full text Add to dashboard Cite

Given a query string Q, an edit similarity search finds all strings in a database whose edit distance with Q is no more than a given threshold τ . Most existing methods answering edit similarity queries employ schemes to generate string subsequences as signatures and generate candidates by set overlap queries on query and data signatures.In this paper, we show that for any such signature scheme, the lower bound of the minimum number of signatures is τ + 1, which is lower than what are achieved by existing methods. We then propose several asymmetric signature schemes, i.e., extracting different numbers of signatures for the data and query strings, which achieve this lower bound. A basic asymmetric scheme is first established on the basis of matching q-chunks and q-grams between two strings. Two efficient query processing algorithms (IndexGram and IndexChunk) are developed on top of this scheme. We also propose novel candidate pruning methods to further improve the efficiency. We then generalize the basic scheme by incorporating novel ideas of floating q-chunks, optimal selection of q-chunks, and reducing the number of signatures using global ordering. As a result, the Super and Turbo families of schemes are developed together with their corresponding query processing algorithms. We have conducted a comprehensive experimental study using the six asymmetric algorithms and nine previous state-of-the-art algorithms. The experiment results clearly showcase the efficiency of our methods and demonstrate space and time characteristics of our proposed algorithms.

show abstract

Top-Down XML Keyword Query Processing

Zhou

Wang

Chen

et al. 2016

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

Improved Spatial Keyword Search Based on IDF Approximation

Zhou

Sun

et al. 2013

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yifei Lu

Efficient exact edit similarity query processing with the asymmetric signature scheme

XClean: Providing valid spelling suggestions for XML keyword queries

Asymmetric signature schemes for efficient exact edit similarity query processing

Top-Down XML Keyword Query Processing

Improved Spatial Keyword Search Based on IDF Approximation

Contact Info

Product

Resources

About