We report on a query compilation technique that enables the construction of alternative efficient query providers for Microsoft's Language Integrated Query (LINQ) framework. LINQ programs are mapped into an intermediate algebraic form, suitable for execution on any SQL:1999-capable relational database system. This compilation technique leads to query providers that (1) faithfully preserve list order and nesting, both being core features of the LINQ data model, (2) support the complete family of LINQ's Standard Query Operators, (3) bring database support to LINQ to XML where the original provider performs in-memory query evaluation, and, most importantly, (4) emit SQL statement sequences whose size is only determined by the input query's result type (and thus independent of the database size).A sample query scenario uses this LINQ provider to marry database-resident TPC-H and XMark data-resulting in a unique query experience that exhibits quite promising performance characteristics, especially for large data instances.
LINQ QUERY PROVIDERSLanguage Integrated Query (LINQ) seamlessly embeds a declarative query language facility into Microsoft's .NET language framework [14]. Developers use the familiar idioms of their programming language-say, C#-plus a generic set of Standard Query Operators (SQOs) to formulate queries against lists of heap-resident C# objects, XML trees, or relational tables. LINQ providers then translate these programs into an executable form chosen to match the kind of queried data: a LINQ query against database-resident relational tables is compiled into a sequence of SQL statements, for example. This translation is performed by the framework's LINQ to SQL provider. LINQ is widely perceived as a signpost to a new declarative form of data access [2,21] and the availability of efficient LINQ providers will be instrumental.The principal LINQ construct is the query comprehension, a generic iteration facility applicable to any data type Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Articles from this volume were presented at The that exhibits the properties of a monad [25] (in LINQ, this role is assumed by the generic types IQueryable
To compensate for the inherent impedance mismatch between the relational data model (tables of tuples) and XML (ordered, unranked trees), tree join algorithms have become the prevalent means to process XML data in relational databases, most notably the TwigStack [6], structural join [1], and staircase join [13] algorithms. However, the addition of these algorithms to existing systems depends on a significant invasion of the underlying database kernel, an option intolerable for most database vendors.Here, we demonstrate that we can achieve comparable XPath performance without touching the heart of the system. We carefully exploit existing database functionality and accelerate XPath navigation by purely relational means: partitioned B-trees bring access costs to secondary storage to a minimum, while aggregation functions avoid an expensive computation and removal of duplicate result nodes to comply with the XPath semantics. Experiments carried out on IBM DB2 confirm that our approach can turn off-the-shelf database systems into efficient XPath processors.
Relational XQuery systems try to re-use mature relational data management infrastructures to create fast and scalable XML database technology. This paper describes the main features, key contributions, and lessons learned while implementing such a system. Its architecture consists of (i) a range-based encoding of XML documents into relational tables, (ii) a compilation technique that translates XQuery into a basic relational algebra, (iii) a restricted (order) property-aware peephole relational query optimization strategy, and (iv) a mapping from XML update statements into relational updates. Thus, this system implements all essential XML database functionalities (rather than a single feature) such that we can learn from the full consequences of our architectural decisions. While implementing this system, we had to extend the state-of-theart with a number of new technical contributions, such as looplifted staircase join and efficient relational query evaluation strategies for XQuery theta-joins with existential semantics. These contributions as well as the architectural lessons learned are also deemed valuable for other relational back-end engines. The performance and scalability of the resulting system is evaluated on the XMark benchmark up to data sizes of 11 GB. The performance section also provides an extensive comparison of all major XMark results published previously, which confirm that the goal of purely relational XQuery processing, namely speed and scalability, was met.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.