If you would like to write for this, or any other Emerald publication, then please use our Emerald for Authors service information about how to choose which publication to write for and submission guidelines are available for all. Please visit www.emeraldinsight.com/authors for more information.
About Emerald www.emeraldinsight.comEmerald is a global publisher linking research and practice to the benefit of society. The company manages a portfolio of more than 290 journals and over 2,350 books and book series volumes, as well as providing an extensive range of online products and additional customer resources and services.Emerald is both COUNTER 4 and TRANSFER compliant. The organization is a partner of the Committee on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive preservation.Abstract-XML is becoming one of the most important structures for data exchange on the web. Despite having many advantages, XML structure imposes several major obstacles to large document processing. Inconsistency between the linear nature of the current algorithms (e.g. for caching and prefetch) used in operating systems and databases, and the non-linear structure of XML data makes XML processing more costly. In addition to verbosity (e.g. tag redundancy), interpreting (i.e. parsing) depthfirst (DF) structure of XML documents is a significant overhead to processing applications (e.g. query engines). Recent research on XML query processing has learned that sibling clustering can improve performance significantly. However, the existing clustering methods are not able to avoid parsing overhead as they are limited by larger document sizes. In this research, We have developed a better data organization for native XML databases, named sibling-first (SF) format that improves query performance significantly. SF uses an embedded index for fast accessing to child nodes. It also compresses documents by eliminating extra information from the original DF format. The converted SF documents can be processed for XPath query purposes without being parsed. We have implemented the SF storage in virtual memory as well as a format on disk. Experimental results with real data have showed that significantly higher performance can be achieved when XPath queries are conducted on very large SF documents.