SUMMARYThe successful adaptation of information integration techniques to the requirements of data Grids is essential for the proliferation of Grid technology. In addition to the well-known problems encountered when integrating heterogeneous sources, the dynamic Grid environment introduces new challenges. This paper discusses the problem of data source discovery, i.e. the selection of the most useful data sources for a given information demand out of a possibly very large set of candidates. We introduce the concept of data source utility and emphasize the pivotal role of semantic correspondences or schema matches for utility. Different variants of concrete utility measures used in an advanced Grid data source registry are presented.
Many model management tasks, e.g., schema matching or merging, require the manual handling of metadata. Given the diversity of metadata, its many different representations and modes of manipulation, meta-modeland task-specific editors usually have to be created from scratch with a considerable investment in time and effort. To ease the creation of custom-tailored editing facilities, we present GEM, a generic editor capable of visualizing and editing arbitrary metadata in an integrated manner. GEM provides a stylesheet language based on graph transformations to customize both, the mode of visualization and the available manipulation operations.Keywords Model management · Graph transformations · Model visualization The importance of metadataThe vision of generic model management [1,12] aims at reducing the effort to create metadata-intensive applications by defining generic operators that work on entire models and providing a model management system that implements these operators. Metadata-intensive applications can then be built on these systems like data-intensive applications are This paper is an extended and revised version of an earlier work presented at the BTW 2009 [7]. built on database management systems today. Examples of such applications include the broad area of information integration or the development of complex software systems.In our research group, we work on novel approaches to create and maintain information integration systems. Creating an integration system subsumes numerous tasks, which all require the handling of metadata artifacts: Integrated schemas are designed from scratch or created by merging the source schemas. Semantic correspondences between the schemas have to be identified and be made explicit by schema matching. Based on these correspondences or "matches", mappings that perform the required data transformations have to be developed, e.g., by configuring wrappers of a federated DBMS and specifying view definitions over the wrapped sources, or by creating ETL scripts for replication-based integration. Existing integration systems need intensive maintenance operations: Changes to system components require the modification of matches and mappings.More than thirty years of research have resulted in numerous approaches to automate some of these tasks, like automatic schema matching and merging techniques. However, for the foreseeable future, these approaches can at best be used in a semi-automatic fashion, therefore requiring human expertise to review, correct, and amend their results. Other tasks, like the design of schemas and software artifacts, are intrinsically manual. Human integration experts and software engineers therefore have to be provided with suitable interfaces to manipulate the many different kinds of metadata required for these tasks: Database schemas are often designed using conceptual metamodels like one of the many E/R variants, and are only later mapped to physical schemas, represented by a data definition language of the respective data model like SQL DDL or XSD...
SUMMARYOne necessary factor in the success of data grid technology to share structured and semi-structured data across organizational boundaries is the availability of convenient, easy-to-use information integration services that provide an application-specific, integrated view over the relevant data sources. To deal with the many challenges of information integration in such a dynamic environment, the integrated handling of metadata is an essential factor. In this paper, we present the PALADIN integration framework, a platform providing the common infrastructure on top of which the different integration services can be built.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.