2006
DOI: 10.1007/11687238_46
|View full text |Cite
|
Sign up to set email alerts
|

XML Duplicate Detection Using Sorted Neighborhoods

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
33
0

Year Published

2007
2007
2023
2023

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 38 publications
(33 citation statements)
references
References 15 publications
0
33
0
Order By: Relevance
“…adaptive vs. constant) for maximal performance (Puhlmann,Weis & 19 A reasonable assumption, since a window size of < 10 was found to be empirically sufficient (Hernández & Stolfo, 1998). Naumann, 2006;Yan, Lee, Kan & Giles, 2007). A major trend has been the proposal of SN algorithms that run on distributed architectures Ma & Yang, 2015).…”
Section: Sorted Neighborhoodmentioning
confidence: 99%
“…adaptive vs. constant) for maximal performance (Puhlmann,Weis & 19 A reasonable assumption, since a window size of < 10 was found to be empirically sufficient (Hernández & Stolfo, 1998). Naumann, 2006;Yan, Lee, Kan & Giles, 2007). A major trend has been the proposal of SN algorithms that run on distributed architectures Ma & Yang, 2015).…”
Section: Sorted Neighborhoodmentioning
confidence: 99%
“…al. [10] proposed unique method which automatically restructures database objects in order to take full advantage of the relations between its attributes. This new structure of objects reflects the relative importance of the attributes in the database and avoids doing the manual selection.…”
Section: Gduplicate Detection Through Structure Optimizationmentioning
confidence: 99%
“…Nevertheless, and because of its more general nature, their approach does not take advantage of the useful features existing in XML databases, such as the element structure or tag semantics. Only more recently has research been performed with the specific goal of discovering duplicate object representations in XML databases [5], [6], [8], [10]. These works differ from previous approaches since they were specifically designed to exploit the distinctive characteristics of XML object representations: their structure, textual content, and the semantics implicit in the XML labels.…”
Section: IIImentioning
confidence: 99%
See 1 more Smart Citation
“…The problem discussed in this paper can be viewed as a graph-based deduplication problem (see [21] for a recent survey), where one of the objects under study is described in a structured form (the enterprise ontology), whereas the other is described in an unstructured fashion (forum entries, query). Recent work has started to address less rigidly structured instances, such as XML objects (e.g., [45]). We are not aware of deduplication approaches encompassing unstructured and structured data.…”
Section: Related Workmentioning
confidence: 99%