The ability to find and count subgraphs of a given network is an important non trivial task with multidisciplinary applicability. Discovering network motifs or computing graphlet signatures are two examples of methodologies that at their core rely precisely on the subgraph counting problem. Here we present the g-trie, a datastructure specifically designed for discovering subgraph frequencies. We produce a tree that encapsulates the structure of the entire graph set, taking advantage of common topologies in the same way a prefix tree takes advantage of common prefixes. This avoids redundancy in the representation of the graphs, thus allowing for both memory and computation time savings. We introduce a specialized canonical labeling designed to highlight common substructures and annotate the g-trie with a set of conditional rules that break symmetries, avoiding repetitions in the computation. We introduce a novel algorithm that takes as input a set of small graphs and is able to efficiently find and count them as induced subgraphs of a larger network. We perform an extensive empirical evaluation of our algorithms, focusing on efficiency and scalability on a set of diversified complex networks. Results show that g-tries are able to clearly outperform previously existing algorithms by at least one order of magnitude.
Abstract-Complex networks from domains like Biology or Sociology are present in many e-Science data sets. Dealing with networks can often form a workflow bottleneck as several related algorithms are computationally hard. One example is detecting characteristic patterns or "network motifs" -a problem involving subgraph mining and graph isomorphism. This paper provides a review and runtime comparison of current motif detection algorithms in the field. We present the strategies and the corresponding algorithms in pseudo-code yielding a framework for comparison. We categorize the algorithms outlining the main differences and advantages of each strategy. We finally implement all strategies in a common platform to allow a fair and objective efficiency comparison using a set of benchmark networks. We hope to inform the choice of strategy and critically discuss future improvements in motif detection.
Many natural structures can be naturally represented by complex networks. Discovering network motifs, which are overrepresented patterns of inter-connections, is a computationally hard task related to graph isomorphism. Sequential methods are hindered by an exponential execution time growth when we increase the size of motifs and networks. In this article we study the opportunities for parallelism in existing methods and propose new parallel strategies that adapt and extend one of the most efficient serial methods known from the Fanmod tool. We propose both a master-worker strategy and one with distributed control, in which we employ a randomized receiver initiated methodology capable of providing dynamic load balancing during the whole computation process. Our strategies are capable of dealing both with exact and approximate network motif discovery. We implement and apply our algorithms to a set of representative networks and examine their scalability up to 128 processing cores. We obtain almost linear speedups, showcasing the efficiency of our proposed approach and are able to reach motif sizes that were not previously achievable using conventional serial algorithms.
Comparing scientific production across different fields of knowledge is commonly controversial and subject to disagreement. Such comparisons are often based on quantitative indicators, such as papers per researcher, and data normalization is very difficult to accomplish. Different approaches can provide new insight and in this paper we focus on the comparison of different scientific fields based on their research collaboration networks. We use co-authorship networks where nodes are researchers and the edges show the existing co-authorship relations between them. Our comparison methodology is based on network motifs, which are over represented patterns, or subgraphs. We derive motif fingerprints for 22 scientific fields based on 29 different small motifs found in the corresponding co-authorship networks. These fingerprints provide a metric for assessing similarity among scientific fields, and our analysis shows that the discrimination power of the 29 motif types is not identical. We use a co-authorship dataset built from over 15,361 publications inducing a co-authorship network with over 32,842 researchers. Our results also show that we can group different fields according to their fingerprints, supporting the notion that some fields present higher similarity and can be more easily compared.
The results of this study contribute to increasing the production of camu-camu fruits grown in dry environments and their use as a health-promoting food. © 2018 Society of Chemical Industry.
Determining the frequency of small subgraphs is an important computational task lying at the core of several graph mining methodologies, such as network motifs discovery or graphlet based measurements. In this paper we try to improve a class of algorithms available for this purpose, namely networkcentric algorithms, which are based upon the enumeration of all sets of k connected nodes. Past approaches would essentially delay isomorphism tests until they had a finalized set of k nodes. In this paper we show how isomorphism testing can be done during the actual enumeration. We use a customized g-trie, a tree data structure, in order to encapsulate the topological information of the embedded subgraphs, identifying already known node permutations of the same subgraph type. With this we avoid redundancy and the need of an isomorphism test for each subgraph occurrence. We tested our algorithm, which we called FaSE, on a set of different real complex networks, both directed and undirected, showcasing that we indeed achieve significant speedups of at least one order of magnitude against past algorithms, paving the way for a faster network-centric approach.
Computing the frequency of small subgraphs on a large network is a computationally hard task. This is, however, an important graph mining primitive, with several applications, and here we present a novel multicore parallel algorithm for this task. At the core of our methodology lies a state-of-the-art data structure, the g-trie, which represents a collection of subgraphs and allows for a very efficient sequential search. Our implementation was done using Pthreads and can run on any multicore personal computer. We employ a diagonal work sharing strategy to dynamically and effectively divide work among threads during the execution. We assess the performance of our Pthreads implementation on a set of representative networks from various domains and with diverse topological features. For most networks, we obtain a speedup of over 50 for 64 cores and an almost linear speedup up to 32 cores, showcasing the flexibility and scalability of our algorithm. This paves the way for the usage of such counting algorithms on larger subgraph and network sizes without the obligatory access to a cluster.
Network motifs are small overrepresented patterns that have been used successfully to characterize complex networks. Current algorithmic approaches focus essentially on pure topology and disregard node and edge nature. However, it is often the case that nodes and edges can also be classified and separated into different classes. This kind of networks can be modeled by colored (or labeled) graphs. Here we present a definition of colored motifs and an algorithm for efficiently discovering them. We use g-tries, a specialized data-structure created for finding sets of subgraphs. G-Tries encapsulate common sub-structure, and with the aid of symmetry breaking conditions and a customized canonization methodology, we are able to efficiently search for several colored patterns at the same time. We apply our algorithm to a set of representative complex networks, showing that it can find colored motifs and outperform previous methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.