2008
DOI: 10.1145/1670243.1670247
|View full text |Cite
|
Sign up to set email alerts
|

The pq -gram distance between ordered labeled trees

Abstract: When integrating data from autonomous sources, exact matches of data items that represent the same real-world object often fail due to a lack of common keys. Yet in many cases structural information is available and can be used to match such data. Typically the matching must be approximate since the representations in the sources differ.We propose pq-grams to approximately match hierarchical data from autonomous sources and define the pq-gram distance between ordered labeled trees as an effective and efficient… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
63
0
1

Year Published

2013
2013
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 67 publications
(64 citation statements)
references
References 41 publications
0
63
0
1
Order By: Relevance
“…Although our inferencer provides schemas which are well defined and tied to the input documents, it could be improved. For example, the algorithm calculates the distance between XML elements used to compute the elements belonging to a complex type may be replaced with new algorithms such as the pq-gram distance calculation [16]. Another way of evolution would be to study how to apply the same processing to responses in JSON or other formats.…”
Section: Discussionmentioning
confidence: 99%
“…Although our inferencer provides schemas which are well defined and tied to the input documents, it could be improved. For example, the algorithm calculates the distance between XML elements used to compute the elements belonging to a complex type may be replaced with new algorithms such as the pq-gram distance calculation [16]. Another way of evolution would be to study how to apply the same processing to responses in JSON or other formats.…”
Section: Discussionmentioning
confidence: 99%
“…Each operation is given a user-defined fixed cost, except for the relabeling operation that employs a user-provided function that compares the values of two nodes. XyDiff [11] computes hash values for all the subtrees of the analyzed trees using DOMHASH 2 , an efficient hashing function specifically tailored for XML subtrees. XyDiff then searches for exact matches in a bottom-up traversal and eagerly tries to expand them looking for common ancestors for the two trees, relying on the node names.…”
Section: Related Workmentioning
confidence: 99%
“…Augsten et al [2] propose an estimation of the tree edit-distance for ordered trees based on pq-grams, subtrees of a fixed shape corresponding to the substrings (called q-grams) used for string similarity evaluation. pq-grams are composed by a stem made of p elements (bound by the parent-child relation) and a base of q consecutive siblings: the last element of the stem is named anchor node.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations