Determining the inner organizational structure of sets of networked elements is of paramount importance to analyze real-world systems such as social, biological, or economic networks. To such an end, it is necessary to identify communities of interrelated nodes within the networks.Recently, a fuzzy community detection approach based on the minimization of a topological error functional has been proposed in the form of a gradient-based algorithm design pattern.However, the intrinsic quadratic algorithmic complexity of the procedure limits the problem size that can be efficiently treated. Here, we extend the ability of this approach to analyze larger networks resorting to parallelism. Thus, we identify the concurrency sources in the gradient-based algorithm design pattern. To determine the parallelization limits, we develop a two-dimensional performance model as a function of the number of processors and network size.The model permits to compute the maximum possible speedup. Another model is presented to find the maximum problem size tractable in a given amount of time. Application of the previous models to a set of benchmark networks shows that parallelization enhances the proposed fuzzy community detection approach in more than an order of magnitude. This allows treatment of networks with several hundred thousand nodes in a time frame of hours. KEYWORDS complex networks, fuzzy communities, machine learning, parallel algorithms, performance model
INTRODUCTIONWe live in an interconnected world. In different study domains sets of entities interact in complex forms giving rise to intricate networks of elements. These networks are usually represented as graphs where the entities define the nodes and the relationships among them define the edges. In fields as different as economics, technology, biology, social sciences, or medicine, these networks encode a wealth of information on the structure and inner workings of the associated entities. 1,2 Therefore, unveiling the information hidden in the patterns of interrelated data is of paramount importance to understand, predict, and even control the behavior of real-world networked systems.In this context, a problem of great interest is the determination of the mesoscale structure of a network. The goal here is to determine communities, groups, of densely related entities. These groups correspond to sets of nodes that interact frequently, share some common attributes, or have some similar behavior. 3 These communities of interrelated elements define the inner structure and group behavior of the networked system. Thus, for instance, community detection reveals the hierarchical structure of real networks, 4 permits identifying groups of individuals with common characteristics in social networks, 5 identifies similar customers in e-commerce systems for recommendation purposes, 6 or finds sets of functionally related brain zones in brain networks. 7Determination of groups, clusters, of similar elements is a traditional topic in pattern recognition and machine learning. In these...