XML is becoming the standard for exchanging and querying information across enterprises. Furthermore, much of e-business XML data increasingly relies on accompanying XSD schema specifications (http://www.w3.org/XML/Schema) to ensure semantically meaningful exchanges of information. Languages such as XPath and XQuery have been proposed for querying XML data. One approach towards supporting query over such XML data is that of building native XML storage and query engine [3]. Alternatively, in many scenarios, "shredding" XML data (with its associated XSD specification) into a relational database is an attractive alternative for storage as it can take the full advantage of mature relational database technology. The latter approach requires us to accomplish the following two tasks to ensure efficient execution of XPath queries over XML data: (1) design the logical mapping from XML schema to relational schema; (2) select physical design structures (i.e., indexes, materialized views, and partitioning) of the relational database where XML is shredded.Although efficiency of mapping depends on both of the steps above, past work such as [1] exclusively focus on the logical design step. In this paper, we examine the interplay of logical and physical design, and experimentally demonstrate that: (1) solving the logical mapping and the physical design problem independently leads to a suboptimal solution; (2) taking into account the physical design space impacts the space of logical mapping. Specifically, well-known outlining and inlining mapping options [1] are rendered unnecessary because they are functionally subsumed by two physical design options: indexes and vertical partitioning. On the other hand, we identified mapping options that are important to leverage when a XSD specification includes "choice", "optional", and maxOccurs. This is because the above constructs imply complex constraints that are difficult to capture solely via physical design in relational databases. For example, an "optional" element ( minOccurs = 0 and maxOccurs = 1) in XSD specifies that itself, its subelements, and attributes are either all null or not null. This corresponds to a complex constraint in relational database that specifies whether a set of columns (possibly from different tables) is null at the same time. This constraint is difficult for user to specify in relational databases (although user can easily specify whether a single column may have null values or not), and can not be inferred from the relational schema mapped from the XML schema. However, we can exploit this constraint by splitting the table for that element into two tables, one storing those with the optional elements and the other stores those without. As a result, queries only accessing one partitioned table may have better performance.Our decision to take into account the interplay of logical and physical design for mapping XML documents requires us to solve a difficult search problem as the the combined space of logical and physical design is extremely large. We propos...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.