Holistic Twig Joins on Indexed XML Documents

Jiang, Haifeng; Wang, Wei; Lü, Hongjun; Yu, Jeffrey Xu

doi:10.1016/b978-012722442-8/50032-x

Cited by 190 publications

(175 citation statements)

References 18 publications

(22 reference statements)

Supporting

Mentioning

175

Contrasting

Order By: Relevance

“…A special focus has been on the efficient evaluation of query twig patterns (see, e.g., Bruno et al 2002, Jiang et al 2003, Choi et al 2003, Kaushik et al 2004). The latter approach integrates a variant of Fagin's Threshold algorithm to return only the most relevant results.…”

Section: Related Workmentioning

confidence: 99%

Semantic Similarity Search on Semistructured Data with the XXL Search Engine

2005

View full text Add to dashboard Cite

Abstract. Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electronic catalogs. However, for searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a ranked list of XML elements in descending order of (estimated) relevance. Web search engines, which are based on the ranked retrieval paradigm, do, however, not consider the additional information and rich annotations provided by the structure of XML documents and their element names.This article presents the XXL search engine that supports relevance ranking on XML data. XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic-similarity search conditions. In addition, ontological information and suitable index structures are used to improve the search efficiency and effectiveness. XXL is fully implemented as a suite of Java classes and servlets. Experiments in the context of the INEX benchmark demonstrate the efficiency of the XXL search engine and underline its effectiveness for ranked retrieval.

show abstract

Section: Related Workmentioning

confidence: 99%

Semantic Similarity Search on Semistructured Data with the XXL Search Engine

2005

View full text Add to dashboard Cite

show abstract

“…Because so many path-processing operators and join mechanisms were proposed in the literature for the processing of query tree patterns (QTP) and because we wanted to check them with our own optimization ideas, we implemented for each of the various solution classes the best-rated algorithm in XTC to provide an identical runtime environment and to use a full-fledged XDBMS (with appropriate indexes available) for accurate cross-comparisons: Structural Joins, TwigStack, TJFast, Twig2Stack, and TwigList [2,6,17,19,22]. Structural Join as the oldest method decomposes a QTP into its binary relationships and executes them separately.…”

Section: Path-processing Operatorsmentioning

confidence: 99%

Benchmarking Performance-Critical Components in a Native XML Database System

Schmidt

Bächle

Härder

2009

Database Systems for Advanced Applications

View full text Add to dashboard Cite

Abstract. The rapidly increasing number of XML-related applications indicates a growing need for efficient, dynamic, and native XML support in database management systems (XDBMS). So far, both industry and academia primarily focus on benchmarking of high-level performance figures for a variety of applications, queries, or documents -frequently executed in artificial workload scenarios -and, therefore, may analyze and compare only specific or incidental behavior of the underlying systems. To cover the full XDBMS support, it is mandatory to benchmark performance-critical components bottom-up, thereby removing bottlenecks and optimizing component behavior. In this way, wrong conclusions are avoided when new techniques such as tailored XML operators, index types, or storage mappings with unfamiliar performance characteristics are used. As an experience report, we present what we have learned from benchmarking a native XDBMS and recommend certain setups to do it in a systematic and meaningful way. MotivationThe increasing presence of XML data and XML-enabled (database) applications is raising the demand for established XML benchmarks. During the last years, a handful of ad-hoc benchmarks emerged and some of them served as basis for on-going XML research [5,29,39], thus constituting some kind of XML "standard" benchmarks. All these benchmarks address the XDBMS behavior and performance visible at the application interface (API) and fail to evaluate and compare properties of the XDBMS components involved in XQuery processing. However, the development of native XDBMSs should be test-driven for all system layers separately, as it was successfully done in the relational world, too, before such high-level benchmarks are used to confirm suitability and efficiency of an XDBMS for a given application domain.In the same way, only high-level features such as document store/retrieve and complete XQuery expressions were drawn on the comparison and adaptation of XML benchmark capabilities [21,26,31,32]. They can be often characterized as "black-box" approaches and are apparently inappropriate to analyze the internal system behavior in a detailed way. This applies to other approaches which focused on specific problems such as handling "shredding" or NULL values efficiently, too.

show abstract

“…Below, we review the existing work on path/twig query evaluation, all of which do not address not-predicates. Earlier works [3,5,9,10,12,13,14] have focused on a decomposition-based approach in which a path query is decomposed into a set of binary (parent-child and ancestor-descendant) relationships between pairs of query nodes. The query is then matched by (1) matching each of the binary structural relationships against the XML data, and (2) "stitching" together these basic matches.…”

Section: Related Workmentioning

confidence: 99%

PathStack¬: A Holistic Path Join Algorithm for Path Query with Not-Predicates on XML Data

Jiao

Ling

Chan

2005

Database Systems for Advanced Applications

View full text Add to dashboard Cite

Abstract. The evaluation of path queries forms the basis of complex XML query processing which has attracted a lot of research attention. However, none of these works have examined the processing of more complex queries that contain not-predicates. In this paper, we present the first study on evaluating path queries with not-predicates. We propose an efficient holistic path join algorithm, PathStack ¬ , which has the following advantages: (1) it requires only one scan of the relevant data to evaluate path queries with not-predicates; (2) it does not generate any intermediate results; and (3) its memory space requirement is bounded by the longest path in the input XML document. We also present an improved variant of PathStack ¬ that further minimizes unnecessary computations.

show abstract

Holistic Twig Joins on Indexed XML Documents

Cited by 190 publications

References 18 publications

Semantic Similarity Search on Semistructured Data with the XXL Search Engine

Semantic Similarity Search on Semistructured Data with the XXL Search Engine

Benchmarking Performance-Critical Components in a Native XML Database System

PathStack¬: A Holistic Path Join Algorithm for Path Query with Not-Predicates on XML Data

Contact Info

Product

Resources

About