David DeHaan scite author profile

Toman

Consens

et al. 2003

The W3C XQuery language recommendation, based on a hierarchical and ordered document model, supports a wide variety of constructs and use cases. There is a diversity of approaches and strategies for evaluating XQuery expressions, in many cases only dealing with limited subsets of the language. In this paper we describe an implementation approach that handles XQuery with arbitrarily-nested FLWR expressions, element constructors and built-in functions (including structural comparisons). Our proposal maps an XQuery expression to a single equivalent SQL query using a novel dynamic interval encoding of a collection of XML documents as relations, augmented with information tied to the query evaluation environment. The dynamic interval technique enables (suitably enhanced) relational engines to produce predictably good query plans that do not restrict the use of sort-merge join query operators. The benefits are realized despite the challenges presented by intermediate results that create arbitrary documents and the need to preserve document order as prescribed by semantics of XQuery. Finally, our experimental results demonstrate that (native or relational) XML systems can benefit from the above technique to avoid a quadratic scale up penalty that effectively prevents the evaluation of nested FLWR expressions for large documents.

Identifying frequent items in sliding windows over on-line packet streams

Golab

Demaine³

et al. 2003

103

Internet traffic patterns are believed to obey the power law, implying that most of the bandwidth is consumed by a small set of heavy users. Hence, queries that return a list of frequently occurring items are important in the analysis of realtime Internet packet streams. While several results exist for computing frequent item queries using limited memory in the infinite stream model, in this paper we consider the limited-memory sliding window model. This model maintains the last N items that have arrived at any given time and forbids the storage of the entire window in memory. We present a deterministic algorithm for identifying frequent items in sliding windows defined over real-time packet streams. The algorithm uses limited memory, requires constant processing time per packet (amortized), makes only one pass over the data, and is shown to work well when tested on TCP traffic logs.

Optimal top-down join enumeration

Tompa

2007

Most contemporary database systems perform cost-based join enumeration using some variant of System-R's bottomup dynamic programming method. The notable exceptions are systems based on the top-down transformational search of Volcano/Cascades. As recent work has demonstrated, bottom-up dynamic programming can attain optimality with respect to the shape of the join graph; no comparable results have been published for transformational search. However, transformational systems leverage benefits of top-down search not available to bottom-up methods.In this paper we describe a top-down join enumeration algorithm that is optimal with respect to the join graph. We present performance results demonstrating that a combination of optimal enumeration with search strategies such as branch-and-bound yields an algorithm significantly faster than those previously described in the literature. Although our algorithm enumerates the search space top-down, it does not rely on transformations and thus retains much of the architecture of traditional dynamic programming. As such, this work provides a migration path for existing bottom-up optimizers to exploit top-down search without drastically changing to the transformational paradigm.

A comprehensive XQuery to SQL translation using dynamic interval encoding

Toman

Consens

et al. 2003

Exploiting ordered dictionaries to efficiently construct histograms with q-error guarantees in SAP HANA

Moerkotte

DeHaan²,

May

et al. 2014

Histograms that guarantee a maximum multiplicative error (q-error) for estimates may significantly improve the plan quality of query optimizers. However, the construction time for histograms with maximum q-error was too high for practical use cases. In this paper we extend this concept with a threshold, i.e., an estimate or true cardinality θ, below which we do not care about the q-error because we still expect optimal plans. This allows us to develop far more efficient construction algorithms for histograms with bounded error. The test for θ,q-acceptability developed also exploits the order-preserving dictionary encoding of SAP HANA. We have integrated this family of histograms into SAP HANA, and we report on the construction time, histograms size, and estimation errors on real-world data sets. In virtually all cases the histograms can be constructed in far less than one second, requiring less than 5% of space compared to the original compressed data.