Proceedings of the 5th ACM International Workshop on Web Information and Data Management 2003
DOI: 10.1145/956699.956719
|View full text |Cite
|
Sign up to set email alerts
|

Finding similar identities among objects from multiple web sources

Abstract: When integrating data from multiple Web sources, objects can exist in different formats and structures, making it difficult to identify those that can be matched together. In this paper, we propose an identification approach to finding similar identities among objects from multiple Web sources. In this approach, object identification works like the relational join operation where a similarity function takes the place of the equality condition. This similarity function is based on information retrieval techniqu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2006
2006
2011
2011

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 26 publications
(11 citation statements)
references
References 5 publications
0
11
0
Order By: Relevance
“…Our approach differs from others in the literature since it can be used to identify and match objects more complexly structured (e.g., XML documents) and not only objects with a flat structure such as relations. The effectiveness of our approach has been demonstrated by means of experiments with real Web data sources from different domains, whose results have reached precision levels above 75% [17].…”
Section: Web Data Integrationmentioning
confidence: 96%
See 2 more Smart Citations
“…Our approach differs from others in the literature since it can be used to identify and match objects more complexly structured (e.g., XML documents) and not only objects with a flat structure such as relations. The effectiveness of our approach has been demonstrated by means of experiments with real Web data sources from different domains, whose results have reached precision levels above 75% [17].…”
Section: Web Data Integrationmentioning
confidence: 96%
“…We have also worked on the problem of integrating data from multiple Web sources [17]. We consider Web sources with objects that can have different formats and structures, which makes it difficult to identify those that can be matched together.…”
Section: Web Data Integrationmentioning
confidence: 99%
See 1 more Smart Citation
“…Chaudhuri et al [12] propose a probabilistic algorithm for retrieving the K records closest to a input record, according to a fuzzy match similarity function that considers the weight of words using the Inverse Document Frequency (IDF) [10]. Carvalho and da Silva [13] also use the vector space model to calculate the similarity between objects from multiple sources. Their approach can be used to deduplicate objects with complex structures such as XML documents.…”
Section: Related Workmentioning
confidence: 99%
“…Among the main challenges are the problems of choosing what evidence to use, and how to find the best weighting schema to apply to the chosen evidence. This has led the research community to develop a number of alternative methods [3,4,6,7,8,9,12,18].…”
Section: Related Workmentioning
confidence: 99%