Proceedings of the Twelfth International Conference on World Wide Web - WWW '03 2003
DOI: 10.1145/775165.775166
|View full text |Cite
|
Sign up to set email alerts
|

Text joins in an RDBMS for web data integration

Abstract: The integration of data produced and collected across autonomous, heterogeneous web services is an increasingly important and challenging problem. Due to the lack of global identifiers, the same entity (e.g., a product) might have different textual representations across databases. Textual data is also often noisy because of transcription errors, incomplete information, and lack of standard formats. A fundamental task during data integration is matching of strings that refer to the same entity.In this paper, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
67
0

Year Published

2005
2005
2013
2013

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 47 publications
(67 citation statements)
references
References 12 publications
0
67
0
Order By: Relevance
“…We also tried to compare to other recently published methods, such as [5,21], but none of these is available for download. In particular, we acknowledge that comparison against Unix command line tools are not completely satisfactory, as our method first builds an index of the set to be searched.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…We also tried to compare to other recently published methods, such as [5,21], but none of these is available for download. In particular, we acknowledge that comparison against Unix command line tools are not completely satisfactory, as our method first builds an index of the set to be searched.…”
Section: Methodsmentioning
confidence: 99%
“…We could not compare PETER with this algorithm as there is no implementation available. The benefit of using positional q-grams was shown in [5]. As the authors implemented a sampling-based approximation for similarity string joins, one cannot directly compare PETER to their tool.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, we adopt the following method, using the "n-gram" technique where n = 3 that has been experimentally shown to work well [11].…”
Section: N and Nmentioning
confidence: 99%
“…There are many ways to evaluate the similarity between textual fields in literature [13], [7], [5]. What we apply in our solution is the TF.IDF method which is effectively used in the WHIRL system [5].…”
Section: Definition 4 Given a Distinctness Threshold D T A Column Cmentioning
confidence: 99%