This paper investigates the benefits of network awareness when processing queries in widelydistributed environments such as the Internet. We present algorithms that leverage knowledge of network characteristics (e.g., topology, bandwidth, etc.) when deciding on the network locations where the query operators are executed. Using a detailed emulation study based on realistic network models, we analyse and experimentally evaluate the proposed approaches for distributed stream processing. Our results quantify the significant benefits of the network-aware approaches and reveal the fundamental trade-off between bandwidth efficiency and result latency that arises in networked query processing.
Applications ranging from algorithmic trading to scientific data analysis require realtime analytics based on views over databases that change at very high rates. Such views have to be kept fresh at low maintenance cost and latencies. At the same time, these views have to support classical SQL, rather than window semantics, to enable applications that combine current with aged or historical data.In this paper, we present viewlet transforms, a recursive finite differencing technique applied to queries. The viewlet transform materializes a query and a set of its higher-order deltas as views. These views support each other's incremental maintenance, leading to a reduced overall view maintenance cost. The viewlet transform of a query admits efficient evaluation, the elimination of certain expensive query operations, and aggressive parallelization. We develop viewlet transforms into a workable query execution technique, present a heuristic and cost-based optimization framework, and report on experiments with a prototype dynamic data management system that combines viewlet transforms with an optimizing compilation technique. The system supports tens of thousands of complete view refreshes a second for a wide range of queries.
We describe a methodology for detecting user errors in spreadsheets, using the notion of units as our basic elements of checking. We define the concept of a header and discuss two types of relationships between headers, namely is-a and has-a relationships. With these, we develop a set of rules to assign units to cells in the spreadsheet. We check for errors by ensuring that every cell has a well-formed unit. We describe an implementation of the system that allows the user to check Microsoft Excel spreadsheets. We have run our system on practical examples, and even found errors in published spreadsheets.
Borealis is a distributed stream processing engine that is being developed at Brandeis University, Brown University, and MIT. Borealis inherits core stream processing functionality from Aurora and inter-node communication functionality from Medusa.We propose to demonstrate some of the key aspects of distributed operation in Borealis, using a multi-player network game as the underlying application. The demonstration will illustrate the dynamic resource management, query optimization and high availability mechanisms employed by Borealis, using visual performance-monitoring tools as well as the gaming experience.
We introduce XPORT, a profile-driven distributed data dissemination system that supports an extensible set of data types, profile types, and optimization metrics. XPORT efficiently implements a generic tree-based overlay network, which can be customized per application using a small number of methods that encapsulate application-specific data filtering, profile aggregation, and optimization logic. The clean separation between the "plumbing" and "application" enables the system to uniformly support disparate dissemination-based applications.We first provide an overview of the basic XPORT model and architecture. We then describe in detail an extensible optimization framework, based on a two-level aggregation model, that facilitates easy specification of a wide range of commonly used performance goals. We discuss distributed tree transformation protocols that allow XPORT to iteratively optimize its operation to achieve these goals under changing network and application conditions. Finally, we demonstrate the flexibility and the effectiveness of XPORT using real-world data and experimental results obtained from both prototype-based LAN emulation and deployment on PlanetLab.
Abstract-We introduce Pulse, a framework for processing continuous queries over models of continuous-time data, which can compactly and accurately represent many real-world activities and processes. Pulse implements several query operators, including filters, aggregates and joins, that work by solving simultaneous equation systems, which in many cases is significantly cheaper than processing a stream of tuples. As such, Pulse translates regular queries to work on continuous-time inputs, to reduce computational overhead and latency while meeting user-specified error bounds on query results. For error bound checking, Pulse uses an approximate query inversion technique that ensures the solver executes infrequently and only in the presence of errors, or no previously known results.We first discuss the high-level design of Pulse, which we fully implemented in a stream processing system. We then characterise Pulse's behavior through experiments with real data, including financial data from the New York Stock Exchange, and spatial data from the Automatic Identification System for tracking naval vessels. Our results verify that Pulse is practical and demonstrates significant performance gains for a variety of workload and query types.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.