Many emerging large-scale data science applications require searching large graphs distributed across multiple memories and processors. This paper presents a distributed breadth-first search (BFS) scheme that scales for random graphs with up to three billion vertices and 30 billion edges. Scalability was tested on IBM BlueGene/L with 32,768 nodes at the Lawrence Livermore National Laboratory. Scalability was obtained through a series of optimizations, in particular, those that ensure scalable use of memory. We use 2D (edge) partitioning of the graph instead of conventional 1D (vertex) partitioning to reduce communication overhead. For Poisson random graphs, we show that the expected size of the messages is scalable for both 2D and 1D partitionings. Finally, we have developed efficient collective communication functions for the 3D torus architecture of BlueGene/L that also take advantage of the structure in the problem. The performance and characteristics of the algorithm are measured and reported.
The traditional, serial, algorithm for finding the strongly connected components in a graph is based on depth first search and has complexity which is linear in the size of the graph. Depth first search is difficult to parallelize, which creates a need for a different parallel algorithm for this problem. We describe the implementation of a recently proposed parallel algorithm that finds strongly connected components in distributed graphs, and discuss how it is used in a radiation transport solver.
The method of discrete ordinates is commonly used to solve the Boltzmann radiation transport equation for applications ranging from simulations of fires to weapons effects. The equations are most efficiently solved by sweeping the radiation flux across the computational grid. For unstructured grids this poses several interesting challenges, particularly when implemented on distributed-memory parallel machines where the grid geometry is spread across processors. We describe an asynchronous, parallel, message-passing algorithm that performs sweeps simultaneously from many directions across unstructured grids. We identify key factors that limit the algorithm's parallel scalability and discuss two enhancements we have made to the basic algorithm: one to prioritize the work within a processor's subdomain and the other to better decompose the unstructured grid across processors. Performance results are given for the basic and enhanced algorithms implemented within a radiation solver running on hundreds of processors of Sandia's Intel Tflops machine and DEC-Alpha CPlant cluster.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.