Stefan Deßloch scite author profile

Abstract. Data warehouses are traditionally refreshed in a periodic manner, most often on a daily basis. Thus, there is some delay between a business transaction and its appearance in the data warehouse. The most recent data is trapped in the operational sources where it is unavailable for analysis. For timely decision making, today's business users asks for ever fresher data.Near real-time data warehousing addresses this challenge by shortening the data warehouse refreshment intervals and hence, delivering source data to the data warehouse with lower latency. One consequence is that data warehouse refreshment can no longer be performed in off-peak hours only. In particular, the source data may be changed concurrently to data warehouse refreshment. In this paper we show that anomalies may arise under these circumstances leading to an inconsistent state of the data warehouse and we propose approaches to avoid refreshment anomalies. Keywords:Near real-time data warehousing, Change Data Capture (CDC), Extract-Transform-Load (ETL), incremental loading of data warehouses. Near Real-Time Data WarehousingData warehousing is a prominent approach to materialized data integration. Data of interest, scattered across multiple heterogeneous sources is integrated into a central database system referred to as the data warehouse. Data integration proceeds in three steps: Data of interest is first extracted from the sources, subsequently transformed and cleansed, and finally loaded into the data warehouse. Dedicated systems referred to as Extract-Transform-Load (ETL) tools have been built to support these data integration steps.The data warehouse facilitates complex data analyses without placing a burden on the operational source systems that run the day-to-day business. In order to catch up with data changes in the operational sources, the data warehouse is refreshed in a periodic manner, usually on a daily basis. Data warehouse M. Castellanos, U. Dayal, and R

show abstract

Discovering data sources in a dynamic Grid environment

Göres

Deßloch

2007

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARYThe successful adaptation of information integration techniques to the requirements of data Grids is essential for the proliferation of Grid technology. In addition to the well-known problems encountered when integrating heterogeneous sources, the dynamic Grid environment introduces new challenges. This paper discusses the problem of data source discovery, i.e. the selection of the most useful data sources for a given information demand out of a possibly very large set of candidates. We introduce the concept of data source utility and emphasize the pivotal role of semantic correspondences or schema matches for utility. Different variants of concrete utility measures used in an advanced Grid data source registry are presented.

show abstract

Extracting deltas from column oriented NoSQL databases for different incremental applications and diverse data targets

Deßloch

2014

Data & Knowledge Engineering

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Stefan Deßloch

Orchid: Integrating Schema Mapping and ETL

Towards generating ETL processes for incremental loading

Near Real-Time Data Warehousing Using State-of-the-Art ETL Tools

Discovering data sources in a dynamic Grid environment

Extracting deltas from column oriented NoSQL databases for different incremental applications and diverse data targets

Contact Info

Product

Resources

About