XML documents contain substantial redundancy in their structure part, because each path from the root node to a leaf node is explicitly represented and typically large sets of such path instances belong to a path class, i.e., the nodes of the path instances are labeled by the same sequence of element (or attribute) names. To save storage space and I/O cost, we want to get rid of this structural redundancy to the extent possible. While all known methods for the physical representation (storage) of XML documents proceed from the root via the element/attribute hierarchy (internal nodes) down to the leaves (values), we follow an upsidedown approach which explicitly stores the values and only reconstructs the internal nodes, if needed. The cornerstones for such a solution are suitable node labels and a path synopsis which efficiently represents all path classes of an XML document. As a solution, we propose a compact internal storage format for native XML database systems where the inner structure of the stored documents is virtualized. Because this elementless storage format provides an efficient reconstruction of a document using its path synopsis, all processing properties are preserved and the semantics of navigational and declarative operations of XML languages remains unchanged. Adjusted indexes support the full spectrum of so-called content-and-structure single path queries.Financial support by the Research Center (CM) 2 of the University of Kaiserslautern is acknowledged Apart from greatly reduced storage consumption, our approach demonstrates its superiority, compared to competing methods, not only for a substantial fraction of those queries, but also for storing, reconstructing, and navigating XML documents.
MotivationXML models semi-structured data and is becoming the standard for data exchange in many (Web) applications. Because the dramatically growing volumes thereby incurred have to be saved for a long time (for legal and other reasons) and messages are data, too, database systems are a proven technology to persistently store and conveniently manage such data. To avoid conversion, not only messages but also conventional DB data are increasingly kept in native XML format, often resulting in collections of huge XML documents. Furthermore, XML's flexibility, i.e., the ability to change the data mapping (freedom of cardinality determination, handling of varying or non-existing structures, etc.) without too much impact to applications [25], is also a driving factor to enable heterogeneous data stores and to facilitate data integration. For these reasons, XML databases currently get more and more momentum if data flexibility in various forms is a key requirement of the application.As XML documents permeate information systems and databases with increasing pace, they are also increasingly 1 3