The emergence of database languages with side effects, notably for XML, raises significant challenges for database compilers and optimizers. In this paper, we extend an algebra for the W3C XML query language with operations that allow data to be immediately updated. We study the impact of that extension on logical optimization, join detection, and pipelining. The main result of this work is to show that, with proper care, a number of important optimizations based on nested relational algebras remain applicable in the presence of side effects. Our approach relies on an analysis of the conditions that must be checked in order for algebraic rewritings to hold. An implementation and experimental results demonstrate the effectiveness of the approach.
INTRODUCTIONIn order to facilitate Web development, a number of languages blending database and programming language capabilities have recently been proposed [5,12,13,17,23]. Two well-known examples are LINQ [17], which extends .NET languages such as C# or Visual Basic with querying primitives, and XQueryP [5], which extends the W3C XML Query language [2] with imperative features. Such languages aim at simplifying existing Web development practices, which typically rely on several different languages used at different tiers. The ability to blend data processing and programming capabilities frees the developer from the need to rely on low-level APIs, or use language embedding, for data access, but also raises significant challenges for compilers. Many database compilers already support expressive languages such as object-oriented languages [8], PL/SQL [11], XSLT [7], or XQuery [2], however, most of the work on query optimization has focused on languages without side effects. Side effects are essential in order to support key programming extensions, such as updates and variable assignment in XQueryP [5], or method calls in object languages [8].In this paper, we propose techniques to adapt existing database compilers to support side effects in XQuery while Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00.preserving essential optimizations based on algebraic rewritings. Surprisingly, there has been very little work in that area in the past. One notable exception is [10] that uses a state monad [20] to support side effects in a nested-relational calculus. However, optimization at the algebraic, logical and physical level is not addressed. To the best of our knowledge, we provide the first treatment of side effects for a nestedrelational algebra. Due to space constraints, we limit our scope to updates applied during query evaluation, and leave procedural extensions (n...