The parallelism motifs of genomic data analysis

Yelick, Katherine; Buluç, Aydın; Awan, Muaaz Gul; Azad, Ariful; Brock, Benjamin; Egan, Rob; Ekanayake, Saliya; Ellis, Marquita; Georganas, Evangelos; Guidi, Giulia; Hofmeyr, Steven; Selvitopi, Oğuz; Teodoropol, Cristina; Oliker, Leonid

doi:10.1098/rsta.2019.0394

Cited by 13 publications

(10 citation statements)

References 72 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our GPU optimizations effectively turn a compute-bound problem into one dominated by communication. In particular, many-to-many k-mer exchange for redistributing k-mers tends to be the secondary bottleneck at small scales and the primary bottleneck at large scale of distributed memory kmer counters [7], [10], [22], [33]. Our novel use of supermers in distributed memory parallelization is combined with GPU optimizations, improving communication costs by reducing communication volume.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Distributed-Memory k-mer Counting on GPUs

Nisa

Pandey

Ellis

et al. 2021

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

View full text Add to dashboard Cite

A fundamental step in many bioinformatics computations is to count the frequency of fixed-length sequences, called k-mers, a problem that has received considerable attention as an important target for shared memory parallelization. With datasets growing at an exponential rate, distributed memory parallelization is becoming increasingly critical. Existing distributed memory k-mer counters do not take advantage of GPUs for accelerating computations. Additionally, they do not employ domain-specific optimizations to reduce communication volume in a distributed environment. In this paper, we present the first GPU-accelerated distributed-memory parallel k-mer counter. We evaluate the communication volume as the major bottleneck in scaling k-mer counting to multiple GPU-equipped compute nodes and implement a supermer-based optimization to reduce the communication volume and to enhance scalability. Our empirical analysis examines the balance of communication to computation on a state-of-the-art system, the Summit supercomputer at Oak Ridge National Lab. Results show overall speedups of up to two orders of magnitude with GPU optimization over CPU-based kmer counters. Furthermore, we show an additional 1.5× speedup using the supermer-based communication optimization.

show abstract

Section: Discussionmentioning

confidence: 99%

“…In particular, the distribution of k-mers is not fixed across biological input datasets and cannot be determined until the run time. The primary methods for scalable-distributed memory k-mer counting rely on distributed hash tables [6], [7], [10], [12], [21], [33].…”

Section: Introductionmentioning

confidence: 99%

Distributed-Memory k-mer Counting on GPUs

Nisa

Pandey

Ellis

et al. 2021

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

View full text Add to dashboard Cite

show abstract

“…Our algorithm will boost many applications in genomics, scientific computing, and social network analysis where SpGEMM has emerged as a key computational kernel. For example, Yelick et al [39] regarded SpGEMM as a parallelism motif of genomic data analysis with applications in alignment, profiling, clustering and assembly for both single genomes and metagenomes. With the size of genomic data growing exponentially, extreme-scale SpGEMM presented in this paper will enable rapid scientific discoveries in these applications.…”

Section: Discussionmentioning

confidence: 99%

Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale

Hussain¹,

Selvitopi²,

Buluç³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Sparse matrix-matrix multiplication (SpGEMM) is a widely used kernel in various graph, scientific computing and machine learning algorithms. In this paper, we consider SpGEMMs performed on hundreds of thousands of processors generating trillions of nonzeros in the output matrix. Distributed SpGEMM at this extreme scale faces two key challenges: (1) high communication cost and (2) inadequate memory to generate the output. We address these challenges with an integrated communication-avoiding and memory-constrained SpGEMM algorithm that scales to 262,144 cores (more than 1 million hardware threads) and can multiply sparse matrices of any size as long as inputs and a fraction of output fit in the aggregated memory. As we go from 16,384 cores to 262,144 cores on a Cray XC40 supercomputer, the new SpGEMM algorithm runs 10x faster when multiplying large-scale protein-similarity matrices.

show abstract

“…Memory needs can approach the terabyte scale, and therefore, the computations commonly use compute clusters with >32 central processing units (CPUs) and >4 GB of random access memory per CPU. GPU computing and edge computing are popularly used due to the inherent single instruction multiple data (SIMD) nature of the computation . Download and storage of the data can also require petabyte-scale systems such as those hosted by the National Microbiome Data Collaborative …”

Section: Machine Learning Methods Applied To Habsmentioning

confidence: 99%

Toward a Predictive Understanding of Cyanobacterial Harmful Algal Blooms through AI Integration of Physical, Chemical, and Biological Data

Marrone,

Banerjee,

Talapatra

et al. 2023

ACS EST Water

View full text Add to dashboard Cite

Freshwater cyanobacterial harmful algal blooms (cyanoHABs) are a worldwide problem resulting in substantial economic losses, due to harm to drinking water supplies, commercial fishing, wildlife, property values, recreation, and tourism. Moreover, toxins produced from some cyanoHABs threaten human and animal health. Climate warming can affect the distribution of cyanoHABs, where rising temperatures facilitate more intense blooms and a greater distribution of cyanoHABs in inland freshwater. Nutrient runoff from adjacent watersheds is also a major driver of cyanoHAB formation. While some of the physicochemical factors behind cyanoHAB dynamics are known, there are still major gaps in our understanding of the conditions that trigger and sustain cyanoHABs over time. In this perspective, we suggest that sufficient data sets, as well as machine learning (ML) and artificial intelligence (AI) tools, are available to build a comprehensive model of cyanoHAB dynamics based on integrated environmental/climate, nutrient/water chemistry, and cyanoHAB microbiome and 'omics data to identify key factors contributing to HAB formation, intensity, and toxicity. By taking a holistic approach to the analysis of all available data, including the rapidly growing number of biological data sets, we can provide the foundational knowledge needed to address the increasing threat of cyanoHABs to the security of our water resources.

show abstract

The parallelism motifs of genomic data analysis

Cited by 13 publications

References 72 publications

Distributed-Memory k-mer Counting on GPUs

Distributed-Memory k-mer Counting on GPUs

Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale

Toward a Predictive Understanding of Cyanobacterial Harmful Algal Blooms through AI Integration of Physical, Chemical, and Biological Data

Contact Info

Product

Resources

About