Timothy Chapman scite author profile

Timothy Chapman

2Publications

11Citation Statements Received

52Citation Statements Given

How they've been cited

How they cite others

Affiliations

DigiPen Institute of Technology, University of California, Santa Cruz

Publications

Order By: Most citations

Parallel algorithms for clustering biological graphs on distributed and shared memory architectures

Rytsareva

Chapman

Kalyanaraman

2014

IJHPCN

View full text Add to dashboard Cite

Graph algorithms on parallel architectures present an interesting case study for irregular applications. In this paper, we address one such irregular application-one of clustering real world graphs constructed out of biological data using parallel computers. While theoretical formulations of the clustering operation are either intractable or computationally prohibitive, efficient heuristics exist to tackle the problem in practice. Yet, implementing these heuristics under a parallel setting becomes a significant challenge owing to a combination of factors including: irregular data access and movement patterns, dependence of computational workload on the input, and a general need to maintain auxiliary pointer-based data structures. In this paper, we present the design and evaluation of two different parallel implementations of a popular serial graph clustering heuristic called the Shingling heuristic, which was originally developed by Gibson et al. Our first implementation, called pClust-sm, is an OpenMP algorithm that targets shared memory multicore platforms. Our second implementation, called pClust-mr, targets distributed memory clusters running Hadoop MapReduce. Even though both implement the same serial algorithm, their underlying implementations are vastly different owing to the differences in their target platforms and programming environments. In the shared memory implementation, we were able to improve both the asymptotic runtime and memory complexities of the serial implementation, and drastically reduce the time to solution from the order of several days to a few minutes on larger inputs (∼100M edges). We evaluated the performance on two different shared memory platforms-a commodity large memory (8-core, 32 GB) compute node, and a single node of a specialized SGI Altix UV. With the Hadoop MapReduce implementation, while we were able to demonstrate linear scaling up to 64 cores on modest sized inputs (∼11M edges), the runtimes were between 1-2 orders of magnitude larger compared to the shared memory implementation. Yet this was sufficient to enhance the problem size reach by about two orders of magnitude relative to a previous serial (single-threaded) implementation, in roughly the same amount of time.

show abstract

An OpenMP algorithm and implementation for clustering biological graphs

Chapman

Kalyanaraman

2011

View full text Add to dashboard Cite

Graph algorithms on parallel architectures present an interesting case study for irregular applications. Among the graph algorithms popular in scientific computing, graph clustering or community detection has numerous applications in computational biology. However, this operation also poses serious computational challenges because of irregular memory access patterns, large memory requirements, and their dependence on other auxiliary (also irregular) data structures to supplement processing. In this paper, we address the problem of graph clustering on shared memory machines. We present a new OpenMP-based parallel algorithm called pClust-sm, which uses adjacency lists, hash tables and unionfind data structures in parallel. The algorithm improves both the asymptotic runtime and memory complexities of a previous serial implementation. Preliminary results show that this algorithm can scale up to 8 threads (cores) of a shared memory machine on a real world metagenomics input graph with 1.2M vertices and 100M edges. More importantly, the new implementation drastically reduces the time to solution from the order of several hours to just over 4 minutes, and in addition, it enhances the problem size reach by at least one order of magnitude.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Timothy Chapman

Parallel algorithms for clustering biological graphs on distributed and shared memory architectures

An OpenMP algorithm and implementation for clustering biological graphs

Contact Info

Product

Resources

About