Data warehouses collect large quantities of data from distributed sources into a single repository. A t ypical load to create or maintain a warehouse processes GBs of data, takes hours or even days to execute, and involves many complex and user-de ned transformations of the data e.g., nd duplicates, resolve data inconsistencies, and add unique keys. If the load fails, a possible approach is to redo" the entire load. A better approach is to resume the incomplete load from where it was interrupted. Unfortunately, traditional algorithms for resuming the load either impose unacceptable overhead during normal operation, or rely on the speci cs of transformations. We develop a resumption algorithm called DR that imposes no overhead and relies only on the high-level properties of the transformations. We show that DR can lead to a ten-fold reduction in resumption time by performing experiments using commercial software.
Data warehouses collect large quantities of data from distributed sources into a single repository. A t ypical load to create or maintain a warehouse processes GBs of data, takes hours or even days to execute, and involves many complex and user-de ned transformations of the data e.g., nd duplicates, resolve data inconsistencies, and add unique keys. If the load fails, a possible approach is to redo" the entire load. A better approach is to resume the incomplete load from where it was interrupted. Unfortunately, traditional algorithms for resuming the load either impose unacceptable overhead during normal operation, or rely on the speci cs of transformations. We develop a resumption algorithm called DR that imposes no overhead and relies only on the high-level properties of the transformations. We show that DR can lead to a ten-fold reduction in resumption time by performing experiments using commercial software.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.