Finding an optimal assignment between two sets of objects is a fundamental problem arising in many applications, including the matching of 'bag-of-words' representations in natural language processing and computer vision. Solving the assignment problem typically requires cubic time and its pairwise computation is expensive on large datasets. In this paper, we develop an algorithm which can find an optimal assignment in linear time when the cost function between objects is represented by a tree distance. We employ the method to approximate the edit distance between two graphs by matching their vertices in linear time. To this end, we propose two tree distances, the first of which reflects discrete and structural differences between vertices, and the second of which can be used to compare continuous labels. We verify the effectiveness and efficiency of our methods using synthetic and real-world datasets.
No abstract
The graph edit distance is an intuitive measure to quantify the dissimilarity of graphs, but its computation is $$\mathsf {NP}$$ NP -hard and challenging in practice. We introduce methods for answering nearest neighbor and range queries regarding this distance efficiently for large databases with up to millions of graphs. We build on the filter-verification paradigm, where lower and upper bounds are used to reduce the number of exact computations of the graph edit distance. Highly effective bounds for this involve solving a linear assignment problem for each graph in the database, which is prohibitive in massive datasets. Index-based approaches typically provide only weak bounds leading to high computational costs verification. In this work, we derive novel lower bounds for efficient filtering from restricted assignment problems, where the cost function is a tree metric. This special case allows embedding the costs of optimal assignments isometrically into $$\ell _1$$ ℓ 1 space, rendering efficient indexing possible. We propose several lower bounds of the graph edit distance obtained from tree metrics reflecting the edit costs, which are combined for effective filtering. Our method termed EmbAssi can be integrated into existing filter-verification pipelines as a fast and effective pre-filtering step. Empirically we show that for many real-world graphs our lower bounds are already close to the exact graph edit distance, while our index construction and search scales to very large databases.
Finding the graphs that are most similar to a query graph in a large database is a common task with various applications. A widely-used similarity measure is the graph edit distance, which provides an intuitive notion of similarity and naturally supports graphs with vertex and edge attributes. Since its computation is NP-hard, techniques for accelerating similarity search have been studied extensively. However, index-based approaches for this are almost exclusively designed for graphs with categorical vertex and edge labels and uniform edit costs. We propose a lter-veri cation framework for similarity search, which supports non-uniform edit costs for graphs with arbitrary attributes. We employ an expensive lower bound obtained by solving an optimal assignment problem. This lter distance satis es the triangle inequality, making it suitable for acceleration by metric indexing. In subsequent stages, assignment-based upper bounds are used to avoid further exact distance computations. Our extensive experimental evaluation shows that a signicant runtime advantage over both a linear scan and state-of-the-art methods is achieved.
The classical Weisfeiler-Leman algorithm aka color re nement is fundamental for graph learning and central for successful graph kernels and graph neural networks. Originally developed for graph isomorphism testing, the algorithm iteratively re nes vertex colors. On many datasets, the stable coloring is reached after a few iterations and the optimal number of iterations for machine learning tasks is typically even lower. This suggests that the colors diverge too fast, de ning a similarity that is too coarse. We generalize the concept of color re nement and propose a framework for gradual neighborhood re nement, which allows a slower convergence to the stable coloring and thus provides a more negrained re nement hierarchy and vertex similarity. We assign new colors by clustering vertex neighborhoods, replacing the original injective color assignment function. Our approach is used to derive new variants of existing graph kernels and to approximate the graph edit distance via optimal assignments regarding vertex similarity. We show that in both tasks, our method outperforms the original color re nement with only moderate increase in running time advancing the state of the art.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.