This paper proposes the use of repetitive broadcast as a way of augmenting the memory hierarchy of clients in an asymmetric communication environment. We describe a new technique called \Broadcast Disks" for structuring the broadcast in a way that provides improved performance for non-uniformly accessed data. The Broadcast Disk superimposes multiple disks spinning at di erent speeds on a single broadcast channel | in e ect creating an arbitrarily ne-grained memory hierarchy. In addition to proposing and de ning the mechanism, a main result of this work is that exploiting the potential of the broadcast structure requires a re-evaluation of basic cache management policies. We examine several \pure" cache management policies and develop and measure implementable approximations to these policies. These results and others are presented in a set of simulation studies that substantiates the basic idea and develops some of the intuitions required to design a particular broadcast program.
Selectivity estimation of queries is an important and well-studied problem in relational database systems. In this paper, we examine selectivity estimation in the context of Geographic Information Systems, which manage spatial data such as points, lines, poly-lines and polygons. In particular, we focus on point and range queries over two-dimensional rectangular data. We propose several techniques based on using spatial indices, histograms, binary space partitionings (BSPs), and the novel notion of spatial skew. Our techniques carefully partition the input rectangles into subsets and approximate each partition accurately. We present a detailed experimental study comparing the proposed techniques and the best known sampling and parametric techniques. We evaluate them using synthetic as well as real-life TIGER datasets. Based on our experiments, we identify a BSP based partitioning that we call Min-Skew which consistently provides the most accurate selectivity estimates for spatial queries. The Min-Skew partitioning can be constructed efficiently, occupies very little space, and provides accurate selectivity estimates over a broad range of spatial queries.
In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex decision support queries using precomputed summary statistics, such as samples. Decision support queries routinely segment the data into groups and then aggregate the information in each group (group-by queries). Depending on the data, there can be a wide disparity b e t ween the number of data items in each group. As a result, approximate answers based on uniform random samples of the data can result in poor accuracy for groups with very few data items, since such groups will be represented in the sample by very few (often zero) tuples.In this paper, we propose a general class of techniques for obtaining fast, highly-accurate answers for group-by queries. These techniques rely on precomputed non-uniform (biased) samples of the data. In particular, we p r o p o s e congressional samples, a h ybrid union of uniform and biased samples. Given a xed amount of space, congressional samples seek to maximize the accuracy for all possible group-by queries on a set of columns. We present a one pass algorithm for constructing a congressional sample and use this technique to also incrementally maintain the sample up-to-date without accessing the base relation. We also evaluate query rewriting strategies for providing approximate answers from congressional samples. Finally, w e conduct an extensive set of experiments on the TPC-D database, which demonstrates the e cacy of the techniques proposed.
In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex aggregate queries based on statistical summaries of the full data. In this paper, we demonstrate the difficulty of providing good approximate answers for join-queries using only statistics (in particular, samples) from the base relations. We propose join synopses (join samples) as an effective solution for this problem and show how precomputing just one join synopsis for each relation suffices to significantly improve the quality of approximate answers for arbitrary queries with foreign key joins. We present optimal strategies for allocating the available space among the various join synopses when the query work load is known and identify heuristics for the common case when the work load is not known. We also present efficient algorithms for incrementally maintaining join synopses in the presence of updates to the base relations. One of our key contributions is a detailed analysis of the error bounds obtained for approximate answers that demonstrates the trade-offs in various methods, as well as the advantages in certain scenarios of a new subsampling method we propose. Our extensive set of experiments on the TPC-D benchmark database show the effectiveness of join synopses and various other techniques proposed in this paper.
Mobile computers and wireless networks are emerging technologies which promise to make ubiquitous computing a reality. One challenge that must be met in order to truly realize this potential is that of providing mobile clients with ubiquitous access to data. Mobile clients may often be disconnected from stationary server machines or may have only a low-bandwidth channel for sending messages to servers. Such an environment raises di culties for supporting data-intensive applications for three reasons: 1) the inability to predict, with 100% accuracy, the future data needs of many applications, 2) limits on storage capacities of mobile machines, and 3) the need to provide clients with new or updated data values. One (and perhaps the only) way to address these challenges is to provide stationary server machines with a relatively high-bandwidth channel over which to broadcast data to a client population in anticipation of the need for that data at the clients. Such a system can be said to be asymmetric due to the disparity in the transmission capacities of clients and servers. We have proposed a mechanism called Broadcast Disks to provide database access in this environment as well as in other asymmetric systems such as cable and direct broadcast satellite television networks and information distribution services. The Broadcast Disk approach enables the creation of an arbitrarily ne-grained memory hierarchy on the broadcast medium. This hierarchy, combined with the inversion of the traditional relationship between clients and servers that occurs in a broadcast-based system raises fundamental new issues for client cache management and data prefetching. In this paper we present a brief overview of asymmetric environments and describe our approaches to broadcast disk organization, client cache management, and prefetching.
In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex aggregate queries based on statistical summaries of the full data. In this paper, we demonstrate the difficulty of providing good approximate answers for join-queries using only statistics (in particular, samples) from the base relations. We propose join synopses (join samples) as an effective solution for this problem and show how precomputing just one join synopsis for each relation suffices to significantly improve the quality of approximate answers for arbitrary queries with foreign key joins. We present optimal strategies for allocating the available space among the various join synopses when the query work load is known and identify heuristics for the common case when the work load is not known. We also present efficient algorithms for incrementally maintaining join synopses in the presence of updates to the base relations. One of our key contributions is a detailed analysis of the error bounds obtained for approximate answers that demonstrates the trade-offs in various methods, as well as the advantages in certain scenarios of a new subsampling method we propose. Our extensive set of experiments on the TPC-D benchmark database show the effectiveness of join synopses and various other techniques proposed in this paper.
Aqua is a system for providing fast,, approximate answers to aggregate queries, which are very common in OLAP applications. It has been designed to run on top of any commercial relational DBMS. Aqua precom:putes synopses (special statistical summaries) of the original data and stores them in the DBMS. It provides approximate answers along with quality guarantees by rewriting the queries to run on these synopses. Finally, Aqua keeps the synopses up-todate as the database changes, using fast incremental maintenance techniques.
The increasing ability to interconnect computexx throughintemetworking, wireless networks, high-bandwidth satellite, and cable networks has spawned a new class of information-centeredapplications based on data dissemination. These applications employ broadcast to deliver data to very large client populations. We have proposed the Broadcast Disks paradigm [Zdon94, Acha95b] for organizing the contents of a data broadcast program and for managing client resources in response to such a program. Our previous work on Bxuadcast Disks focused exclusively on the "push-based" approach, where data is sent out on the broadcast channel according to a periodic schedule, in anticipation of client requests. In this paper, we study how to augment the push-only model with a "pull-based" approach of using a backchannel to allow clients to send explicit requests for data to the server. We analyze the scalability and performance of a broadcast-based system that integrates push and pull and study the impact of this integration on both the steady state and warm-up performance of clients. Our results show that a client backchannel can provide significant performance improvement in the broadcast environmen~but that unconstrained use of the backchannel can result in scalability problems due to server saturation. We propose and investigate a set of three techniques that can delay the onset of saturation and thus, enhance the performance and scalability of the system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.