Structure-based inference of xml similarity for fuzzy duplicate detection

Leitão, Luís; Calado, Pável; Weis, Melanie

doi:10.1145/1321440.1321483

Cited by 37 publications

(41 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Nevertheless, and because of its more general nature, their approach does not take advantage of the useful features existing in XML databases, such as the element structure or tag semantics. Only more recently has research been performed with the specific goal of discovering duplicate object representations in XML databases [5], [6], [8], [10]. These works differ from previous approaches since they were specifically designed to exploit the distinctive characteristics of XML object representations: their structure, textual content, and the semantics implicit in the XML labels.…”

Section: IIImentioning

confidence: 99%

See 1 more Smart Citation

EDDDS: An Efficient Duplicate Data Detection System

Dhake¹,

S.S.²,

Y.R.³

et al. 2015

International Journal of Advanced Research in Computer and Comm

View full text Add to dashboard Cite

Duplicate Detection is critical task of any database of any organization. Duplicates are nothing but the same real time entities or objects are presented in the form of different structure and in the different formats. We can find out the duplicates in relational data, in complex data and hierarchical data like XML. There are lots of works already presented in the past for finding the duplicates in the relational data. But nowadays there is more focus on finding duplicates in the XML data. Because of XML is very popular for data storing and extensively used for data exchange between the organizations. Here we have done an extensive literature survey on this topic and proposed a duplicate detection method that incorporates some of the existing paper's ideas and some of our original ideas. In addition to improving the efficiency and effectiveness, we also checks for its typographical errors when comparing the two XML elements. To test the correctness of our method, we are comparing it with existing duplicate detection system, and giving more focus on how we get higher precision and recall values in the various datasets we have used.

show abstract

Section: IIImentioning

confidence: 99%

“…The XMLDup system first proposed in [6] uses a Bayesian Network model (BN) for XML duplicate detection. Milano et al propose a distance measure between two XML object representations that is defined based on the concept of overlays [8].…”

Section: IIImentioning

confidence: 99%

EDDDS: An Efficient Duplicate Data Detection System

Dhake¹,

S.S.²,

Y.R.³

et al. 2015

International Journal of Advanced Research in Computer and Comm

View full text Add to dashboard Cite

show abstract

“…BN [38] MOMA [55] SERF [5] Active Atlas [53,54] MARLIN [11,12] Multiple Classifier System [62] Operator Trees [13] TAILOR [24] FEBRL [18,17] STEM [36] Context Based Framework [16] Training-based between two entities. The previously proposed approaches mostly assume that corresponding attributes from the input datasets have been determined beforehand, either manually or with the help of schema matching.…”

Section: Matchersmentioning

confidence: 99%

“…BN (Bayesian Network): Leitão et al [38] propose a framework for matching XML entities based on a Bayesian network (BN) model. Bayesian networks provide a graph-based formalism to explicitly represent the dependencies among the entities of a domain.…”

Section: Framework Without Trainingmentioning

confidence: 99%

Frameworks for entity matching: A comparison

Köpcke

Rahm

2010

Data & Knowledge Engineering

346

204

View full text Add to dashboard Cite

a b s t r a c tEntity matching is a crucial and difficult task for data integration. Entity matching frameworks provide several methods and their combination to effectively solve different match tasks. In this paper, we comparatively analyze 11 proposed frameworks for entity matching. Our study considers both frameworks which do or do not utilize training data to semiautomatically find an entity matching strategy to solve a given match task. Moreover, we consider support for blocking and the combination of different match algorithms. We further study how the different frameworks have been evaluated. The study aims at exploring the current state of the art in research prototypes of entity matching frameworks and their evaluations. The proposed criteria should be helpful to identify promising framework approaches and enable categorizing and comparatively assessing additional entity matching frameworks and their evaluations.

show abstract

“…Nonetheless, values are usually taken into account with methods dedicated to XML change management [13,14], data integration [29,40], and XML structure-and-content querying applications [66,67], where documents tend to have similar structures (probably conforming to the same grammar [36,83]). …”

Section: Figmentioning

confidence: 99%