Abstract-Extensible Markup Language (XML) is emerging as a de facto standard for information exchange among various applications on the World Wide Web. There has been a growing need for developing high-performance techniques to query large XML data repositories efficiently. One important problem in XML query processing is twig pattern matching, that is, finding in an XML data tree D all matches that satisfy a specified twig (or path) query pattern Q. In this survey, we review, classify, and compare major techniques for twig pattern matching.1 Specifically, we consider two classes of major XML query processing techniques: the relational approach and the native approach. The relational approach directly utilizes existing relational database systems to store and query XML data, which enables the use of all important techniques that have been developed for relational databases, whereas in the native approach, specialized storage and query processing systems tailored for XML data are developed from scratch to further improve XML query performance. As implied by existing work, XML data querying and management are developing in the direction of integrating the relational approach with the native approach, which could result in higher query processing performance and also significantly reduce system reengineering costs.Index Terms-XML query processing, twig pattern matching.
In this paper we address the problem of evaluating XPath queries over streaming XML data. We consider a practical XPath fragment called Univariate XPath, which includes the commonly used '/' and '//' axes and allows * -node tests and arbitrarily nested predicates. It is well known that this XPath fragment can be efficiently evaluated in O(|D||Q|) time in the non-streaming environment [18], where |D| is the document size and |Q| is the query size. However, this is not necessarily true in the streaming environment, since streaming algorithms have to satisfy stricter requirement than non-streaming algorithms, in that all data must be read sequentially in one pass. Therefore, it is not surprising that state-of-the-art stream-querying algorithms have higher time complexity than O(|D||Q|).In this paper we revisit the XPath stream-querying problem, and show that Univariate XPath can be efficiently evaluated in O(|D||Q|) time in the streaming environment. Specifically, we propose two O(|D||Q|)-time stream-querying algorithms, LQ and EQ, which are based on the lazy strategy and on the eager strategy, respectively. To the best of our knowledge, LQ and EQ are the first XPath stream-querying algorithms that achieve O(|D||Q|) time performance. Further, our algorithms achieve O(|D||Q|) time performance without trading off space performance. Instead, they have better buffering-space performance than state-of-the-art stream-querying algorithms. In particular, EQ achieves optimal buffering-space performance. Our experimental results show that our algorithms have not only good theoretical complexity but also considerable practical performance advantages over existing algorithms.
Abstract-One of the important issues in data warehouse development is the selection of a set of views to materialize in order to accelerate a large number of on-line analytical processing (OLAP) queries. The maintenance-cost view-selection problem is to select a set of materialized views under certain resource constraints for the purpose of minimizing the total query processing cost. However, the search space for possible materialized views may be exponentially large. A heuristic algorithm often has to be used to find a near optimal solution. In this paper, for the maintenance-cost view-selection problem, we propose a new constrained evolutionary algorithm. Constraints are incorporated into the algorithm through a stochastic ranking procedure. No penalty functions are used. Our experimental results show that the constraint handling technique, i.e., stochastic ranking, can deal with constraints effectively. Our algorithm is able to find a near-optimal feasible solution and scales with the problem size well.
We study the problem of finding efficient equivalent viewbased rewritings of relational queries, focusing on query optimization using materialized views under the assumption that base relations cannot contain duplicate tuples. A lot of work in the literature addresses the problems of answering queries using views and query optimization. However, most of it proposes solutions for special cases, such as for conjunctive queries (CQs) or for aggregate queries only. In addition, most of it addresses the problems separately under set or bag-set semantics for query evaluation, and some of it proposes heuristics without formal proofs for completeness or soundness. In this paper we look at the two problems by considering CQ/A queries -that is, both pure conjunctive and aggregate queries, with aggregation functions SUM, COUNT, MIN, and MAX; the DISTINCT keyword in (SQL versions of) our queries is also allowed. We build on past work to provide algorithms that handle this general setting. This is possible because recent results on rewritings of CQ/A queries [1,8] show that there are sound and complete algorithms based on containment tests of CQs.Our focus is that our algorithms are efficient as well as sound and complete. Besides the contribution we make in putting and addressing the problems in this general setting, we make two additional contributions for bag-set and set semantics. First, we propose efficient sound and complete tests for equivalence of CQ/A queries to rewritings that use overlapping views (the algorithms are complete with respect to the language of rewritings). These results apply not only to query optimization, but to all areas where the goal is to obtain efficient equivalent view-based query rewritings. Second, based on these results we propose two sound algorithms, BDPV and CDPV, that find efficient execution plans for CQ/A queries in terms of materialized views. Both algorithms extend the cost-based query-optimization approach of System R [19]. The efficient sound algorithm BDPV is also complete in some cases, whereas CDPV is sound and complete for all CQ/A queries we consider. We present a study of the completeness-efficiency tradeoff in the algorithms, and provide experimental results that show the viability of our approach and test the limits of query optimization using overlapping views.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.