Schema matching is a fundamental issue to many database applications, such as query mediation and data warehousing. It becomes a challenge when different vocabularies are used to refer to the same real-world concepts. In this context, a convenient approach, sometimes called extensional, instance-based or semantic, is to detect how the same real world objects are represented in different databases and to use the information thus obtained to match the schemas. Additionally, we argue that automatic approaches of schema matching should store provenance data about matchings. This paper describes an instance-based schema matching technique for an OWL dialect and proposes a data model for storing provenance data. The matching technique is based on similarity functions and is backed up by experimental results with real data downloaded from data sources found on the Web.
Abstract. One of the design principles that can stimulate the growth and increase the usefulness of the Web of data is URIs linkage. However, the related URIs are typically in different datasets managed by different publishers. Hence, the designer of a new dataset must be aware of the existing datasets and inspect their content to define sameAs links. This paper proposes a technique based on probabilistic classifiers that, given a datasets S to be published and a set T of known published datasets, ranks each Ti ∈ T according to the probability that links between S and Ti can be found by inspecting the most relevant datasets. Results from our technique show that the search space can be reduced up to 85%, thereby greatly decreasing the computational effort.
A catalogue holds information about a set of objects, typically classified using terms taken from a given thesaurus, and described with the help of a set of attributes. Matching a pair of catalogues means to find a relationship between the terms of their thesauri and a relationship between their attributes. This paper first introduces a matching approach, based on the notion of similarity, that applies to both thesauri and attribute matching. It then describes matchings based on mutual information and introduces variations that explore certain heuristics. Finally, it discusses experimental results that evaluate the precision of the matchings and that measure the influence of the heuristics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.