A number of XML labeling methods have been proposed to store XML documents in relational databases. However, they have a vulnerable point, in insertion operations. We propose the Variable Length Endless Insertable (VLEI) code and apply it to XML labeling to reduce the cost of insertion operations. Results of our experiments indicate that a combination of the VLEI code and Dewey order is effective for handling skewed insertions.
Recently, more and more data are published and exchanged by XML on the Internet. However, different XML data sources might contain the same data but have different structures. Therefore, it requires an efficient method to integrate such XML data sources so that more complete and useful information can be conveniently accessed and acquired by users. The tree edit distance is regarded as an effective metric for evaluating the structural similarity in XML documents. However, its computational cost is extremely expensive and the traditional wisdom in join algorithms cannot be applied easily. In this paper, we propose LAX (Leaf-clustering based Approximate XML join algorithm), in which the two XML document trees are clustered into subtrees representing independent items and the similarity between them is determined by calculating the similarity degree based on the leaf nodes of each pair of subtrees. We also propose an effective algorithm for clustering the XML document for LAX. We show that it is easily to apply the traditional wisdom in join algorithms to LAX and the join result contains complete information of the two documents. We then do experiments to compare LAX with the tree edit distance and evaluate its performance using both synthetic and real data sets. Our experimental results show that LAX is more effcient in performance and more effective for measuring the approximate similarity between XML documents than the tree edit distance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.