We propose improved XPath query processing algorithms on XML documents by extending the MTree navigational XML database index. Our algorithms efficiently resolve element name specific XPath navigational queries, in many cases without a need for sorting or for qualified name filtering on intermediate sequences. The optimization methods are applicable for all axes but are presented for the four major XPath axes: descendant, ancestor, following and preceding. Experimental results are included that show substantial performance improvements over other well known methods.
This paper introduces the MTree index algorithm, a special purpose XML XPath index designed to meet the needs of the hierarchical XPath query language.With the increasing importance of XML, XPath, and XQuery, several methods have been proposed for creating XML structure indexes and many variants using relational technology have been proposed. This work proposes a new XML structure index, called MTree, which is designed to be optimal for traversing all XPath axes. The primary feature of MTree lies in its ability to provide the next subtree root node in document order, for all axes, to each context node in O(1). MTree is a special purpose XPath index structure that matches the special purpose query requirements for XPath. This approach is in contrast to other approaches that map the problem domain into general purpose index structures such as B-Tree that must reconstruct the XML tree from those structures for every query. MTree supports modification operations such as insert and delete. MTree has been implemented both in memory and on disk, and performance results using XMark benchmark data are presented showing up to two orders of magnitude improvement over other well-known implementations.
Much research has been done adapting relational technology for use with XML and XPath query processing, several research efforts have focused on native XML databases, and some research efforts have focused on hybrid approaches. This paper presents a hybrid design: we extend the usage of path summary indexes by combining them with partitioned indexes on schema-less XML documents to accelerate XPath query processing.Efficient XPath query processing is important because XPath is the query language used for node selection within XQuery.To index an XML document, each node is assigned a path identifier that is unique for every rootto-node path. A separate XML path summary index is created, itself encoded as an XML document, which summarizes the document structure by eliminating path redundancies which are inherent within many XML document instances. The use of structure summaries is widely adopted. Two additional supporting indexes are utilized: first, the XML structure is placed into a structure index that is partitioned by the path identifier, and second, the XML element and attribute values are placed into a separate value index that is partitioned by the same path identifier. Therefore, we integrate structure summaries, complete structure, and values into a unified index. To support comprehensive integration we use unique implementation and query methods.XPath queries, either partially or fully, are first executed against the summary index to derive candidate path identifiers which are placed into a specialized hash map tree cursor.We introduce the partitioned branching path join, a twig join that enables efficient index nested loop joins between various B+-tree partitions on the same structure relation, guided by the hash map tree cursor. We conclude with performance results from several queries using our lightweight prototype system, which demonstrates that our combination of methods matches or outperforms existing high-end database engines when determining node sequences for several XPath queries.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.