Proceedings of the Twelfth International Conference on World Wide Web - WWW '03 2003
DOI: 10.1145/775152.775166
|View full text |Cite
|
Sign up to set email alerts
|

Text joins in an RDBMS for web data integration

Abstract: The integration of data produced and collected across autonomous, heterogeneous web services is an increasingly important and challenging problem. Due to the lack of global identifiers, the same entity (e.g., a product) might have different textual representations across databases. Textual data is also often noisy because of transcription errors, incomplete information, and lack of standard formats. A fundamental task during data integration is matching of strings that refer to the same entity.In this paper, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
48
0

Year Published

2005
2005
2022
2022

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 115 publications
(53 citation statements)
references
References 17 publications
0
48
0
Order By: Relevance
“…K m represents a possible join query (each relation node in the tree, or connected to a node in the tree by a zero-cost edge, represents a query atom, and each nonzerocost edge represents a join or selection condition). Like most keyword search-over-database systems, Q generates queries that may produce relevant answers by running an approximate Steiner tree algorithm [43] to connect matching nodes in the search graph with the lowest-cost tree, and executes them and unions their results together in ranked order using a top-k query processing algorithm [16,21,27]. While the Q system combines cost components (features) derived from data as well as metadata, in this paper we focus on features that are associated with the metadata and the query-particularly those having to do with predicted schema matches-rather than those derived from specific fields in the data.…”
Section: Search and Rankingmentioning
confidence: 99%
“…K m represents a possible join query (each relation node in the tree, or connected to a node in the tree by a zero-cost edge, represents a query atom, and each nonzerocost edge represents a join or selection condition). Like most keyword search-over-database systems, Q generates queries that may produce relevant answers by running an approximate Steiner tree algorithm [43] to connect matching nodes in the search graph with the lowest-cost tree, and executes them and unions their results together in ranked order using a top-k query processing algorithm [16,21,27]. While the Q system combines cost components (features) derived from data as well as metadata, in this paper we focus on features that are associated with the metadata and the query-particularly those having to do with predicted schema matches-rather than those derived from specific fields in the data.…”
Section: Search and Rankingmentioning
confidence: 99%
“…This technique is also very useful to data integration applications. A special case is the approximate join operator [17,18,39] which matches records from different files according to the degree of similarity between their fields.…”
Section: Related Workmentioning
confidence: 99%
“…In [9,16,17], they present how to declaratively integrate similarity functions to the DBMS using an SQL interface and perform entity extraction tasks. Unfortunately, no single similarity function can always outperform the others.…”
Section: Related Workmentioning
confidence: 99%