21st International Conference on Data Engineering (ICDE'05)
DOI: 10.1109/icde.2005.126
|View full text |Cite
|
Sign up to set email alerts
|

Schema Matching Using Duplicates

Abstract: Most data integration applications require a matching between the schemas of the respective data sets. We show how the existence of duplicates within these data sets can be exploited to automatically identify matching attributes. We describe an algorithm that first discovers duplicates among data sets with unaligned schemas and then uses these duplicates to perform schema matching between schemas with opaque column names.Discovering duplicates among data sets with unaligned schemas is more difficult than in th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
137
0
2

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 143 publications
(141 citation statements)
references
References 24 publications
(25 reference statements)
2
137
0
2
Order By: Relevance
“…It is based on the cosine similarity that doesn't automatically discard words which are not strictly identical. This metric has two main advantages: 1) the token order is not important, 2) common uninformative words don't greatly affect similarity [15], [16]. The SoftTFIDF similarity measure can be computed between s and t as follows:…”
Section: B Similarity Function Assignmentmentioning
confidence: 99%
“…It is based on the cosine similarity that doesn't automatically discard words which are not strictly identical. This metric has two main advantages: 1) the token order is not important, 2) common uninformative words don't greatly affect similarity [15], [16]. The SoftTFIDF similarity measure can be computed between s and t as follows:…”
Section: B Similarity Function Assignmentmentioning
confidence: 99%
“…Bilke and Naumann [Bilke and Naumann 2005] propose a semantic technique based on an analysis of duplicated instances. Leme et al ] introduced the notion of a contextualized vocabulary matching between a source ontology and a target ontology using a finite set of quadruples as the specification model; and also proposed a semantic schema matching technique based on similarity functions.…”
Section: Related Workmentioning
confidence: 99%
“…In this approach, no assumptions of common domains, global schema, underlying generative ontology, or other simplifications are made. Data are treated simply as opaque objects; the search process is purely syntactically and structurally driven [1,11]. The user-provided source and target instances provide the initial matches which drive the search process.…”
Section: Example 2 Consider the Basic Transformations Involved In Resmentioning
confidence: 99%
“…The approaches to data mapping most closely related to our data mapping solution are the works of Bilke and Naumann [1] and Kang and Naughton [11] on schema matching, and the Clio project [15] on schema mapping. To our knowledge, these works have not considered the full space of data-metadata transformations, with only the Clio [15] project considering any aspects of such mappings.…”
Section: Related Workmentioning
confidence: 99%