In this paper, we study the twig pattern matching in XML document databases. Two algorithms A1 and A2 are discussed according to two different definitions of tree embedding. By the first definition, only the ancestor-descendant relationship is considered. By the second one, we take not only the ancestor-descendant relationship, but also the order of siblings into account. Both A1 and A2 are based on a subtree reconstruction technique, by which a tree structure is reconstructed according to a given set of data streams. More importantly, by revealing an interesting property of tree encoding, we show that the subtree reconstruction can be easily extended to a strategy (i.e., A1) for checking subtree matching according to the first definition with any kind of path join or join-like operations being completely avoided. A2 needs more time and space since it deals with a more difficult problem, but without join operations involved, either. The computational complexities of both algorithms are analysed, showing that they have a better performance than any existing strategy for this problem.
Abstract-With the growing importance of XML in data exchange, much research has been done in providing flexible query mechanisms to extract data from XML documents. In this paper, we focus on the query evaluation in an XML streaming environment, in which data streams arrive continuously and queries have to be evaluated even before all the data of an XML document are available. Two algorithms will be discussed. One is for the unordered tree matching, by which only ancestor-descendant and parent-child relationships are considered. It requires O(|T'|⋅leaf Q ) time, where T' is a subtree of document tree T, in which each node matches at least one node in query Q and leaf Q is the number of leaf nodes in Q. The other is for the ordered tree matching, by which the left-to-right order of nodes must also be taken into account. It runs in O(|T'|⋅|Q|) time. Furthermore, our algorithms achieve high time performance without trading off space requirements. They have the same caching space and buffering space overhead as state-of-the-art stream-querying algorithm. We show the efficiency and effectiveness of our algorithms by a lot of experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.