Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management 2007
DOI: 10.1145/1321440.1321483
|View full text |Cite
|
Sign up to set email alerts
|

Structure-based inference of xml similarity for fuzzy duplicate detection

Abstract: Fuzzy duplicate detection aims at identifying multiple representations of real-world objects stored in a data source, and is a task of critical practical relevance in data cleaning, data mining, or data integration. It has a long history for relational data stored in a single table (or in multiple tables with equal schema). Algorithms for fuzzy duplicate detection in more complex structures, e.g., hierarchies of a data warehouse, XML data, or graph data have only recently emerged. These algorithms use similari… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
39
0

Year Published

2010
2010
2017
2017

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 37 publications
(41 citation statements)
references
References 25 publications
2
39
0
Order By: Relevance
“…Nevertheless, and because of its more general nature, their approach does not take advantage of the useful features existing in XML databases, such as the element structure or tag semantics. Only more recently has research been performed with the specific goal of discovering duplicate object representations in XML databases [5], [6], [8], [10]. These works differ from previous approaches since they were specifically designed to exploit the distinctive characteristics of XML object representations: their structure, textual content, and the semantics implicit in the XML labels.…”
Section: IIImentioning
confidence: 99%
See 1 more Smart Citation
“…Nevertheless, and because of its more general nature, their approach does not take advantage of the useful features existing in XML databases, such as the element structure or tag semantics. Only more recently has research been performed with the specific goal of discovering duplicate object representations in XML databases [5], [6], [8], [10]. These works differ from previous approaches since they were specifically designed to exploit the distinctive characteristics of XML object representations: their structure, textual content, and the semantics implicit in the XML labels.…”
Section: IIImentioning
confidence: 99%
“…The XMLDup system first proposed in [6] uses a Bayesian Network model (BN) for XML duplicate detection. Milano et al propose a distance measure between two XML object representations that is defined based on the concept of overlays [8].…”
Section: IIImentioning
confidence: 99%
“…BN [38] MOMA [55] SERF [5] Active Atlas [53,54] MARLIN [11,12] Multiple Classifier System [62] Operator Trees [13] TAILOR [24] FEBRL [18,17] STEM [36] Context Based Framework [16] Training-based between two entities. The previously proposed approaches mostly assume that corresponding attributes from the input datasets have been determined beforehand, either manually or with the help of schema matching.…”
Section: Matchersmentioning
confidence: 99%
“…BN (Bayesian Network): Leitão et al [38] propose a framework for matching XML entities based on a Bayesian network (BN) model. Bayesian networks provide a graph-based formalism to explicitly represent the dependencies among the entities of a domain.…”
Section: Framework Without Trainingmentioning
confidence: 99%
“…Nonetheless, values are usually taken into account with methods dedicated to XML change management [13,14], data integration [29,40], and XML structure-and-content querying applications [66,67], where documents tend to have similar structures (probably conforming to the same grammar [36,83]). …”
Section: Figmentioning
confidence: 99%