In keyword search over data graphs, an answer is a nonredundant subtree that includes the given keywords. An algorithm for enumerating answers is presented within an architecture that has two main components: an engine that generates a set of candidate answers and a ranker that evaluates their score. To be effective, the engine must have three fundamental properties. It should not miss relevant answers, has to be efficient and must generate the answers in an order that is highly correlated with the desired ranking. It is shown that none of the existing systems has implemented an engine that has all of these properties. In contrast, this paper presents an engine that generates all the answers with provable guarantees. Experiments show that the engine performs well in practice. It is also shown how to adapt this engine to queries under the OR semantics. In addition, this paper presents a novel approach for implementing rankers destined for eliminating redundancy. Essentially, an answer is ranked according to its individual properties (relevancy) and its intersection with the answers that have already been presented to the user. Within this approach, experiments with specific rankers are described.
Lawler-Murty's procedure is a general tool for designing algorithms for enumeration problems (i.e., problems that involve the production of a large set of answers in ranked order), which naturally arise in database management. Lawler-Murty's procedure is used in a variety of modern database applications; particularly in those related to keyword search over structured data. Essentially, this procedure enumerates by invoking a series of instances of an optimization problem (i.e., finding the best solution); solving the optimization problem is the only part that depends on the specific task at hand. The topic of optimizing and parallelizing Lawler-Murty's procedure is investigated. Naive parallelism can be carried out by concurrently solving independent instances of the optimization problem. This can be improved by printing the next answer, in the enumeration order, as soon as none of the concurrent instances can produce a better answer. However, this approach alone suffers from poor utilization of available threads. That leads to the idea of freezing an instance of the optimization problem. Interestingly, not only is freezing beneficial to the parallel execution of Lawler-Murty's, it also substantially reduces the running time of the serial execution. Additional improvements of the freezing technique are then developed to further enhance the utilization of threads, and they result in a significant overall speedup. The effectiveness of the proposed approach is demonstrated on keyword search over data graphs, wherein an extensive experimental study is described.
A system for keyword search on data graphs is demonstrated on two challenging datasets: the large DBLP and Mondial (which is highly cyclic and has a complex schema). The system supports search, exploration and question answering. The demonstration shows how the system copes with the main challenges in keywords search on data graphs. In particular, the system generates answers efficiently and completely (i.e., it does not miss answers). It has an effective ranking mechanism that also takes into account redundancies among answers. Finally, the system uses a novel technique for displaying multi-node subtrees in a compact graphical form that facilitates quick and easy understanding, which is essential for effective browsing of the answers.
A data graph is a convenient paradigm for supporting keyword search that takes into account available semantic structure and not just textual relevance. However, the problem of constructing data graphs that facilitate both efficiency and effectiveness of the underlying system has hardly been addressed. A conceptual model for this task is proposed. Principles for constructing good data graphs are explained. Transformations for generating data graphs from RDB and XML are developed. The results obtained from these transformations are analyzed. It is shown that XML is a better starting point for getting a good data graph.Comment: Full version of DEXA'16 pape
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.