A data warehouse stores information that is collected from multiple, heterogeneous information sources for the purpose of complex querying and analysis. Information in the warehouse is typically stored in the form of materialized views, which represent pre-computed portions of frequently asked queries. One of the most important tasks when designing a warehouse is the selection of materialized views to be maintained in the warehouse. The goal is to select a set of views in such a way as to minimize the total query response time over all queries, given a limited amount of time for maintaining the views (maintenance-cost view selection problem). In this paper, we propose an efficient solution to the maintenance-cost view selection problem using a genetic algorithm for computing a near-optimal set of views. Specifically, we explore the maintenance-cost view selection problem in the context of OR view graphs. We show that our approach represents a dramatic improvement in time complexity over existing search-based approaches using heuristics. Our analysis shows that the algorithm consistently yields a solution that lies within 10% of the optimal query benefit while at the same time exhibiting only a linear increase in execution time. We have implemented a prototype version of our algorithm which is used to simulate the measurements used in the analysis of our approach.
tsimmis 1 OverviewIn order to access information from a variety of heterogeneous information sources, one has to be able to translate queries and data from one data model into another.This functionality is provided by so-called (source) wrappers [4,8] which convert queries into one or more commands/queries understandable by the underlying source and transform the native results into a format understood by the application. As part of the TSIMMISproject [1,6] we have developed hard-coded wrappers for a variety of sources (e.g., Sybase DBMS, W WW pages, etc.) including legacy systems (Folio). However, anyone who has built a wrapper before can attest that a lot of effort goos into developing and writing such a wrapper. In situations where it is important or desirable to gain access to new sources quicldy, this is a major drawback. Furthermore, we have also observed that only a relatively small part of the code deals with the specific access details of the source. The rest of the code is either common among wrappers or implements query and data transformation that could be expressed in a high level, declarative fashion.Based on these observations, we have developed a wrapper implementation toolkit [7] for quickly building wrappers. The toolkit contains a library for commonly used functions, such as for receiving queries from the application and packaging results. It also ' Permission to make digitellhard copy of part or all this work for personal or clacsroom use is granted without fee provided that contains a facility for translating queries into sourcespecific commands, and for translating results into a model useful to the application.The philosophy behind our "template-baaed" translation methodology is as follows. The wrapper implementor specifies a set of templates (rules) written in a high level declarative language that describe the queries accepted by the wrapper as well as the objects that it returns. If an application query matches a template, an implementorprovided action associated with the template is executed to rovide the native query for the underly-F ing source . When the source returns the result of the query, the wrapper transforms the answer which is represented in the data model of the source into a representation that is used by the application. Using this toolkit one can quicldy design a simple wrapper with a few templates that cover some of the desired functionality, probably the one that is most urgently needed. However, templates can be added gradually as more functionality is required later on.Another important use of wrappers is in extending the query capabilities of a source. For instance, some sources may not be capable of answering queries that have multiple predicates. In such cases, it is necessary to pose a native query to such a source using only predicates that the source is capable of handling. The rest of the predicates are automatically separated from the user query and form a jilter query.When the wrapper receives the results, a poet-processing engine applies the filter query, ...
A w arehouse is a repository of integrated information drawn from remote data sources. Since a warehouse e ectively implements materialized views, we m ust maintain the views as the data sources are updated. This view maintenance problem di ers from the traditional one in that the view de nition and the base data are now decoupled. We show that this decoupling can result in anomalies if traditional algorithms are applied. We i n troduce a new algorithm, ECA (for \Eager Compensating Algorithm"), that eliminates the anomalies. ECA is based on previous incremental view maintenance algorithms, but extra \compensating" queries are used to eliminate anomalies. We also introduce two streamlined versions of ECA for special cases of views and updates, and we present an initial performance study that compares ECA to a view recomputation algorithm in terms of messages transmitted, data transferred, and I/O costs.
An approach to accommodating semantic heterogeneity in a federation of interoperable, autonomous, heterogeneous databases is presented. A mechanism is described for identifying and resolving semantic heterogeneity while at the same time honoring the autonomy of the database components that participate in the federation. A minimal, common data model is introduced as the basis for describing sharable information, and a three-pronged facility for determining the relationships b e t ween information units objects is developed. Our approach serves as a basis for the sharing of related concepts through partial schema uni cation without the need for a global view of the data that is stored in the di erent components. The mechanism presented here can be seen in contrast with more traditional approaches such a s i n tegrated databases" or distributed databases". An experimental prototype implementation has been constructed within the framework of the Remote-Exchange experimental system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.